Rdkit: Using SDF molecules without sanitization to match substructures give RuntimeError

Created on 28 Sep 2017 · 6Comments · Source: rdkit/rdkit

I am using rdkit 2017.03.3

When I am reading in molecules from sdf stream via SDMolSupplier without sanitization, and then trying to match the molecules to SMARTS pattern which includes the total connectivity atomic property requirement, I get RuntimeError regarding implicit hydrogens.

Here I use cyclohexane as an example:

from rdkit import Chem

# the sdf of cyclohexane is provided at the bottom
suppl = Chem.SDMolSupplier('./cyclohexane.sdf',   removeHs = False, sanitize = False)
rdkmol = suppl[0]
rdkmol.GetSubstructMatches(Chem.MolFromSmarts("[#6X4]-[#6X4]"))

I get:

RuntimeError: Pre-condition Violation
    getNumImplicitHs() called without preceding call to calcImplicitValence()
    Violation occurred on line 153 in file Code/GraphMol/Atom.cpp
    Failed Expression: d_implicitValence > -1

Am I using the SDMolSupplier correctly? If I am, it seems some information is lost when parsing the sdf molecule. It I set sanitize to True this issue disappears.

sdf of cyclohexane:

 18 18  0  1  0               999 V2000
   -0.0187    1.5258    0.0104 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0021   -0.0041    0.0020 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.4167    2.0553   -0.0004 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.7182   -0.4975   -1.2568 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.1536    0.0321   -1.2676 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.1328    1.5619   -1.2592 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.5459    1.8868   -0.8726 H   0  0  0  0  0  0  0  0  0  0  0  0
   -0.5289    1.8773    0.9072 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.5293   -0.3651    0.8851 H   0  0  0  0  0  0  0  0  0  0  0  0
   -1.0205   -0.3814    0.0098 H   0  0  0  0  0  0  0  0  0  0  0  0
    1.4019    3.1452    0.0056 H   0  0  0  0  0  0  0  0  0  0  0  0
    1.9439    1.6943    0.8826 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.7330   -1.5874   -1.2628 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.1910   -0.1364   -2.1398 H   0  0  0  0  0  0  0  0  0  0  0  0
    2.6808   -0.3290   -0.3846 H   0  0  0  0  0  0  0  0  0  0  0  0
    2.6638   -0.3194   -2.1644 H   0  0  0  0  0  0  0  0  0  0  0  0
    1.6057    1.9230   -2.1423 H   0  0  0  0  0  0  0  0  0  0  0  0
    3.1554    1.9392   -1.2670 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  1  0  0
  1  3  1  0  1  0  0
  1  7  1  0  0  0  0
  1  8  1  0  0  0  0
  2  4  1  0  1  0  0
  2  9  1  0  0  0  0
  2 10  1  0  0  0  0
  3  6  1  0  1  0  0
  3 11  1  0  0  0  0
  3 12  1  0  0  0  0
  4  5  1  0  1  0  0
  4 13  1  0  0  0  0
  4 14  1  0  0  0  0
  5  6  1  0  1  0  0
  5 15  1  0  0  0  0
  5 16  1  0  0  0  0
  6 17  1  0  0  0  0
  6 18  1  0  0  0  0
M  END
$$$$

Source

hjuinj

Most helpful comment

If you turn off sanitization (is there a reason you are doing this?) then things like valence are not calculated. If you need to read without sanitization, then be sure to call 'mol.UpdatePropertyCache()' before you do the substructure match

greglandrum on 29 Sep 2017

👍2

All 6 comments

greglandrum on 29 Sep 2017

👍2

Hi Greg,

Thank you for your help. I am working on the open force field project that John showed at the UGM. The reason for no sanitization is because the philosophy that the input molecule should already be in the user-intended form to be simulated. Thus the molecule is left untouched before matching against SMIRKS patterns.

May I ask why when I read molecules in mol2 files I do not need to perform 'mol.UpdatePropertyCache()' ? Additionally UpdatePropertyCache() will not alter the molecular topology right?

hjuinj on 29 Sep 2017

It's safe to call mol.updatePropertyCache(); it just calculates valence states of the atoms. There are no topology changes.

greglandrum on 29 Sep 2017

Thanks again Greg :)
Have you had the chance to take a look at the other issue I seem to encounter (#1590) regarding difference in interpretation between mol2 and sdf molecules?

hjuinj on 30 Sep 2017

Hi Greg,
I am having problems using updatePropertyCache(), regardless of whether the molecule is read from sdf or mol2 or from smiles, I get AttributeError: 'Mol' object has no attribute 'updatePropertyCache'

e.g. :
m1 = Chem.MolFromSmiles('Cc1ccccc1')
print(m1) <rdkit.Chem.rdchem.Mol object at 0x7f3bb90e1c60>
m1.updatePropertyCache()

AttributeError: 'Mol' object has no attribute 'updatePropertyCache'

hjuinj on 15 Oct 2017

You’ve got the capitalization wrong: The Python call is UpdatePropertyCache()

greglandrum on 16 Oct 2017

👍1

Was this page helpful?

0 / 5 - 0 ratings