I am using rdkit 2017.03.3
When I am reading in molecules from sdf stream via SDMolSupplier without sanitization, and then trying to match the molecules to SMARTS pattern which includes the total connectivity atomic property requirement, I get RuntimeError regarding implicit hydrogens.
Here I use cyclohexane as an example:
from rdkit import Chem
# the sdf of cyclohexane is provided at the bottom
suppl = Chem.SDMolSupplier('./cyclohexane.sdf', removeHs = False, sanitize = False)
rdkmol = suppl[0]
rdkmol.GetSubstructMatches(Chem.MolFromSmarts("[#6X4]-[#6X4]"))
I get:
RuntimeError: Pre-condition Violation
getNumImplicitHs() called without preceding call to calcImplicitValence()
Violation occurred on line 153 in file Code/GraphMol/Atom.cpp
Failed Expression: d_implicitValence > -1
Am I using the SDMolSupplier correctly? If I am, it seems some information is lost when parsing the sdf molecule. It I set sanitize to True this issue disappears.
sdf of cyclohexane:
18 18 0 1 0 999 V2000
-0.0187 1.5258 0.0104 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0021 -0.0041 0.0020 C 0 0 0 0 0 0 0 0 0 0 0 0
1.4167 2.0553 -0.0004 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7182 -0.4975 -1.2568 C 0 0 0 0 0 0 0 0 0 0 0 0
2.1536 0.0321 -1.2676 C 0 0 0 0 0 0 0 0 0 0 0 0
2.1328 1.5619 -1.2592 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.5459 1.8868 -0.8726 H 0 0 0 0 0 0 0 0 0 0 0 0
-0.5289 1.8773 0.9072 H 0 0 0 0 0 0 0 0 0 0 0 0
0.5293 -0.3651 0.8851 H 0 0 0 0 0 0 0 0 0 0 0 0
-1.0205 -0.3814 0.0098 H 0 0 0 0 0 0 0 0 0 0 0 0
1.4019 3.1452 0.0056 H 0 0 0 0 0 0 0 0 0 0 0 0
1.9439 1.6943 0.8826 H 0 0 0 0 0 0 0 0 0 0 0 0
0.7330 -1.5874 -1.2628 H 0 0 0 0 0 0 0 0 0 0 0 0
0.1910 -0.1364 -2.1398 H 0 0 0 0 0 0 0 0 0 0 0 0
2.6808 -0.3290 -0.3846 H 0 0 0 0 0 0 0 0 0 0 0 0
2.6638 -0.3194 -2.1644 H 0 0 0 0 0 0 0 0 0 0 0 0
1.6057 1.9230 -2.1423 H 0 0 0 0 0 0 0 0 0 0 0 0
3.1554 1.9392 -1.2670 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 1 0 0
1 3 1 0 1 0 0
1 7 1 0 0 0 0
1 8 1 0 0 0 0
2 4 1 0 1 0 0
2 9 1 0 0 0 0
2 10 1 0 0 0 0
3 6 1 0 1 0 0
3 11 1 0 0 0 0
3 12 1 0 0 0 0
4 5 1 0 1 0 0
4 13 1 0 0 0 0
4 14 1 0 0 0 0
5 6 1 0 1 0 0
5 15 1 0 0 0 0
5 16 1 0 0 0 0
6 17 1 0 0 0 0
6 18 1 0 0 0 0
M END
$$$$
If you turn off sanitization (is there a reason you are doing this?) then things like valence are not calculated. If you need to read without sanitization, then be sure to call 'mol.UpdatePropertyCache()' before you do the substructure match
Hi Greg,
Thank you for your help. I am working on the open force field project that John showed at the UGM. The reason for no sanitization is because the philosophy that the input molecule should already be in the user-intended form to be simulated. Thus the molecule is left untouched before matching against SMIRKS patterns.
May I ask why when I read molecules in mol2 files I do not need to perform 'mol.UpdatePropertyCache()' ? Additionally UpdatePropertyCache() will not alter the molecular topology right?
It's safe to call mol.updatePropertyCache(); it just calculates valence states of the atoms. There are no topology changes.
Thanks again Greg :)
Have you had the chance to take a look at the other issue I seem to encounter (#1590) regarding difference in interpretation between mol2 and sdf molecules?
Hi Greg,
I am having problems using updatePropertyCache(), regardless of whether the molecule is read from sdf or mol2 or from smiles, I get AttributeError: 'Mol' object has no attribute 'updatePropertyCache'
e.g. :
m1 = Chem.MolFromSmiles('Cc1ccccc1')
print(m1) <rdkit.Chem.rdchem.Mol object at 0x7f3bb90e1c60>
m1.updatePropertyCache()
AttributeError: 'Mol' object has no attribute 'updatePropertyCache'
You鈥檝e got the capitalization wrong: The Python call is UpdatePropertyCache()
Most helpful comment
If you turn off sanitization (is there a reason you are doing this?) then things like valence are not calculated. If you need to read without sanitization, then be sure to call 'mol.UpdatePropertyCache()' before you do the substructure match