Describe the bug
mol.GetSubstructMatches gets stuck using a pattern with a lot of fragments.
To Reproduce
from rdkit.Chem import MolFromSmiles, AllChem
smiles = 'S=C(SSC(=S)N(C[34CH:34]([35CH2:35][36CH3:36])[37CH2:37][38CH2:38][39CH2:39][40CH3:40])C[38CH:38]([37CH2:37][34CH2:34][35CH2:35][36CH3:36])[39CH2:39][40CH3:40])N(C[34CH:34]([35CH2:35][36CH3:36])[37CH2:37][38CH2:38][39CH2:39][40CH3:40])C[34CH:34]([35CH2:35][36CH3:36])[37CH2:37][38CH2:38][39CH2:39][40CH3:40]'
smarts = '[CH2;+0:34].[CH2;+0:38].[CH2;+0:38].[CH2;+0:38].[CH3;+0:36].[CH3;+0:36].[CH3;+0:36].[CH3;+0:36].[CH3;+0:40].[CH3;+0:40].[CH3;+0:40].[CH3;+0:40].[CH;+0:34]-[C;H2;D2;+0]-[N;H0;D3;+0](-[C;H2;D2;+0]-[CH;+0:34])-[C;H0;D3;+0](=[S;H0;D1;+0])-[S;H0;D2;+0]-[S;H0;D2;+0]-[C;H0;D3;+0](=[S;H0;D1;+0])-[N;H0;D3;+0](-[C;H2;D2;+0]-[CH;+0:34])-[C;H2;D2;+0]-[CH;+0:38]'
mol = MolFromSmiles(smiles)
fragment = AllChem.MolFromSmarts(smarts)
mol.GetSubstructMatches(fragment, useChirality=True, maxMatches=5) # stuck here
Expected behavior
This function should end at some point, or a timeout option would be a great option here.
Configuration (please complete the following information):
@RobinFrcd See here for a similar problem and a suggested workaround:
https://gist.github.com/ptosco/863cb55ace485c6664c21c244b2ca10a
Yes, I think a timeout would be good.
Hi,
We had the same problem and we opted to use a solution based on
https://github.com/pnpnpn/timeout-decorator.
Kind regards,
Christos
Christos Kannas
Research Software Engineer (Cheminformatics)
[image: View Christos Kannas's profile on LinkedIn]
http://cy.linkedin.com/in/christoskannas
On Mon, 12 Apr 2021 at 15:24, Paolo Tosco @.*> wrote:
@RobinFrcd https://github.com/RobinFrcd See here for a similar problem
and a suggested workaround:
https://gist.github.com/ptosco/863cb55ace485c6664c21c244b2ca10a
Yes, I think a timeout would be good.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/rdkit/rdkit/issues/4025#issuecomment-817808913, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AA4P6SS2JP4KSF6V24O36Y3TILYCRANCNFSM42ZINVPA
.
@RobinFrcd See here for a similar problem and a suggested workaround:
https://gist.github.com/ptosco/863cb55ace485c6664c21c244b2ca10a
Yes, I think a timeout would be good.
Yes, I saw your answer on sourceforge, thanks for the workaround!
Leaving the issue open because it would be great to have the timeout in RDKit itself!
Your smarts pattern here is really suboptimal. You have disconnected single atoms followed by large patterns. The search space here is enormous.
If you rearrange it to find the largest patterns first, at least for this pattern, the results are pretty instantaneous:
import time
# put the larger patterns fist
smarts2 = ".".join(sorted(smarts.split("."),key=lambda s: len(s), reverse=True))
fragment = AllChem.MolFromSmarts(smarts2)
t1 = time.time()
mol.GetSubstructMatches(fragment, useChirality=False, maxMatches=5)
t2 = time.time()
print("Found matches in", t1-t1, "seconds")
Now this doesn't preclude having a timeout or that this could be an optimization to the internal search engine for sure, but when running smarts pattens you always want to have the largest patterns or the ones most likely to fail first in the string.