Description:
Can we add a parameter to bypass smiles "rule" based generator to be able to get a random smiles for a given starting atom number ?
Your code sample here
+1
Collecting a few (maybe) useful links:
I added this to #2059, but here is a simple python function that randomizes smiles:
```from rdkit import Chem
import random
def randomSmiles(m1):
m1.SetProp("_canonicalRankingNumbers", "True")
idxs = list(range(0,m1.GetNumAtoms()))
random.shuffle(idxs)
for i,v in enumerate(idxs):
m1.GetAtomWithIdx(i).SetProp("_canonicalRankingNumber", str(v))
return Chem.MolToSmiles(m1)
m1 = Chem.MolFromSmiles("CNOPc1ccccc1")
s = set()
for i in range(1000):
smiles = randomSmiles(m1)
s.add(smiles)
print(s)
```
generating ALL possible smiles is much, much harder to do efficiently than it seems at first blush.
@bp-kelley: Randomizing the ranks certainly helps, but it doesn't solve the problem that the traversal algorithm still prefers non-ring bonds to ring bonds - this takes preference over the ranks. Here's an example using your randomSmiles() function:
In [13]: m2 = Chem.MolFromSmiles('CC1C(CC=1)O')
In [14]: set(randomSmiles(m2) for x in range(1000))
Out[14]:
{'C1(C)=CCC1O',
'C1(O)C(C)=CC1',
'C1(O)CC=C1C',
'C1=C(C)C(O)C1',
'C1C(O)C(C)=C1',
'C1C=C(C)C1O',
'CC1=CCC1O',
'OC1C(C)=CC1',
'OC1CC=C1C'}
Most helpful comment
I added this to #2059, but here is a simple python function that randomizes smiles:
```from rdkit import Chem
import random
def randomSmiles(m1):
m1.SetProp("_canonicalRankingNumbers", "True")
idxs = list(range(0,m1.GetNumAtoms()))
random.shuffle(idxs)
for i,v in enumerate(idxs):
m1.GetAtomWithIdx(i).SetProp("_canonicalRankingNumber", str(v))
return Chem.MolToSmiles(m1)
m1 = Chem.MolFromSmiles("CNOPc1ccccc1")
s = set()
for i in range(1000):
smiles = randomSmiles(m1)
s.add(smiles)
print(s)
```
generating ALL possible smiles is much, much harder to do efficiently than it seems at first blush.