So I get my .mol file from user input in a web application. If the user inputs something invalid (e.g. a central carbon connected to 5 other carbons), I want to show the user a warning. Currently, Chem.MolFromMolBlock() prints a warning to the terminal when the molecule is invalid, e.g.
Explicit valence for atom # 0 C, 5, is greater than permitted
Trying to get the warning string into the Python code, I tried Chem.SanitizeMol(m), but I get the following error when doing that on an invalid molecule. When I do Chem.SanitizeMol(m)on a valid representation, it works fine.
Chem.SanitizeMol(m)
Boost.Python.ArgumentError: Python argument types in
rdkit.Chem.rdmolops.SanitizeMol(NoneType)
did not match C++ signature:
SanitizeMol(RDKit::ROMol {lvalue} mol, unsigned int sanitizeOps=rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_ALL, bool catchErrors=False)
If I understand correctly, this is because Chem.MolFromMolBlock() returns None when the molecule is invalid. I believe it would be much more correct to throw an exception here, which the programmer may catch and handle as they see fit. "Silently" returning None is surely bad practice.
There are multiple views on what the correct behavior is when invalid input is provided. The RDKit's behaviour (in the Python wrappers) is to return None.
If you would like an exception to be raised when parsing or sanitization fails, it's pretty easy to do with something like this (not tested):
def mol_from_mb(mb):
res = Chem.MolFromMolBlock(mb,sanitize=False)
if res is None: raise ValueError('bad input')
sanitFail = Chem.SanitizeMol(res,catchErrors=True)
if sanitFail: raise ValueError(sanitFail)
return res
Sure, I can do that. But the problem is I can't be more specific than "bad input", even though RDkit knows why my input is bad (it prints it to stderr, e.g. "can't kekulize" or "too many bonds" etc.). I need to get that information into the script.
OK, I solved it. When you add sanitize=False as you indicated, the function no longer returns Nonefor an invalid molecule. So the following works perfectly:
m = Chem.MolFromMolBlock(block,sanitize=False)
try:
Chem.SanitizeMol(m)
sm = Chem.MolToSmiles(m)
except ValueError as e:
sm = str(e)
Now sm containes the SMILES if the user input is valid, and otherwise it contains the correct error message.
Closing.
Most helpful comment
There are multiple views on what the correct behavior is when invalid input is provided. The RDKit's behaviour (in the Python wrappers) is to return None.
If you would like an exception to be raised when parsing or sanitization fails, it's pretty easy to do with something like this (not tested):