Rdkit: [Question] Many molecules in a smiles string

Created on 22 Jul 2020  路  3Comments  路  Source: rdkit/rdkit

Configuration:


  • RDKit Version: 2019.09.3
  • Operating system: MacOS
  • Python version (if relevant): 3.7.4
  • Are you using condo? Yes
  • If you are using conda, which channel did you install the rdkit from? conda install -c rdkit rdkit
  • If you are not using conda: how did you install the RDKit?

Description:

Is there a way to detect that there are many molecules in a smiles string without running some connected component algorithm on the adjacency matrix of the molecule?

For example the following is 2 separate molecules (and there are no bonds):

s = 'C.O'
m = Chem.MolFromSmiles(s, sanitize=True)
for a in m.GetAtoms():
    print(a.GetSymbol())
for b in m.GetBonds():
    print(b)
'''
C
O
'''

Thanks!

question

All 3 comments

Maybe you can try to modify the smiles to smiles+'>>C',
Then, use the method 'AllChem.ReactionFromSmarts' to load the new smiles,
Finally, use the method 'GetReactants' to get the desired molecules

from rdkit.Chem import AllChem

s = 'C.O'
new_s = s + '>>C'
rxn = AllChem.ReactionFromSmarts(new_s)
for reactant in rxn.GetReactants():
     print(AllChem.MolToSmiles(reactant))
>> C
>> O

The function you want is Chem.GetMolFrags():

In [3]: for smi in ('CO','C.O','C.O.O'): 
   ...:     m = Chem.MolFromSmiles(smi) 
   ...:     print(smi,Chem.GetMolFrags(m)) 
   ...:                                                                                                    
CO ((0, 1),)
C.O ((0,), (1,))
C.O.O ((0,), (1,), (2,))

Thanks to both of you!

Was this page helpful?
5 / 5 - 1 ratings