Rdkit: Ability to generate a list of possible smiles representation for a given molecule

Created on 15 Sep 2018 · 3Comments · Source: rdkit/rdkit

Description:

Can we add a parameter to bypass smiles "rule" based generator to be able to get a random smiles for a given starting atom number ?

RDKit Version:
Platform:

Your code sample here

Hackathon idea enhancement

Source

thegodone

Most helpful comment

I added this to #2059, but here is a simple python function that randomizes smiles:

```from rdkit import Chem
import random

def randomSmiles(m1):
m1.SetProp("_canonicalRankingNumbers", "True")
idxs = list(range(0,m1.GetNumAtoms()))
random.shuffle(idxs)
for i,v in enumerate(idxs):
m1.GetAtomWithIdx(i).SetProp("_canonicalRankingNumber", str(v))
return Chem.MolToSmiles(m1)

m1 = Chem.MolFromSmiles("CNOPc1ccccc1")
s = set()
for i in range(1000):
smiles = randomSmiles(m1)
s.add(smiles)

print(s)
```
generating ALL possible smiles is much, much harder to do efficiently than it seems at first blush.

bp-kelley on 23 Sep 2018

👍2 🚀1

All 3 comments

+1
Collecting a few (maybe) useful links:

Your mailing list post @thegodone 😄:
- https://sourceforge.net/p/rdkit/mailman/message/36382511/
Esben's work presented last year:
- Presentation - https://github.com/rdkit/UGM_2017/blob/master/Presentations/Bjerrum_RDKitUGM_Smiles_Enumeration_for_RNN.pdf
- Publication: https://arxiv.org/pdf/1703.07076.pdf
- Code: https://github.com/EBjerrum/SMILES-enumeration/blob/master/SmilesEnumerator.py

JoshuaMeyers on 20 Sep 2018