Am I right to understand that the pg_dump of a table with a mol field is done via its SMILES ?
This produces other mol information such as mol properties or conformations to be lost after a dump and restore.
It's how it is intended to work, or am I missing something ?
RDKit version: 2019.03.1
RDKit cartridge version: 0.74.0
Thanks !
Hi @xjalencas ,
Unfortunately you are correct: an RDKit mol column appears in the output of pg_dump just as SMILES.
And, yes, this does result in conformations being lost.
A partial workaround here is to create a column containing the molecule coordinates:
select id,mol_to_ctab(m)::text as ctab into ctabs from mols1;
Then at least you'll have the coordinates.
It occurs to me that we can get even closer to solving this problem by using CXSmiles within the cartridge. I'll look into that.
Thanks! This helps a lot.
I also tried with a byte column with mol_to_pkl(mol), so multiple conformations can be stored.
Hello, are there any updates or changes to this? It would definitely be helpful to be able to pg_dump to MOL/SDF format and not lose 3D information without keeping extra columns in sync.
Currently, am using pg_dump with the plain output format, i.e. to a textual SQL dump. Any idea if there is any benefit to using the custom format with RDKit, i.e. whether that retains the mol as binary data, or whether it also does the SMILES conversion?
I can experiment, but any existing knowledge much appreciated.
@harryjubb : the cartridge now will write cxsmiles (instead of smiles) for molecules that have conformations saved, so you still get the coordinates:
hembl_26=# select molregno,m from demo_mols limit 2;
molregno | m
----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | Cc1cc(-n2ncc(=O)[nH]c2=O)ccc1C(=O)c1ccccc1Cl |(4.7417,-4.1417,;5.2667,-3.8417,;5.2667,-3.2417,;5.7917,-2.95,;5.7917,-2.35,;6.3125,-2.05,;6.3125,-1.45,;5.7917,-1.15,;5.7917,-0.55,;5.2792,-1.45,;5.2792,-2.05,;4.7542,-2.3417,;6.3042,-3.2417,;6.3042,-3.8417,;5.7875,-4.1417,;5.7875,-4.7417,;5.2667,-5.0417,;6.3042,-5.0417,;6.8167,-4.7417,;7.3417,-5.0417,;7.3417,-5.6417,;6.825,-5.9417,;6.3042,-5.6417,;5.7792,-5.9417,)|
2 | Cc1cc(-n2ncc(=O)[nH]c2=O)ccc1C(=O)c1ccc(C#N)cc1 |(4.7417,-4.1417,;5.2667,-3.8417,;5.2667,-3.2417,;5.7917,-2.95,;5.7917,-2.35,;6.3125,-2.05,;6.3125,-1.45,;5.7917,-1.15,;5.7917,-0.55,;5.2792,-1.45,;5.2792,-2.05,;4.7542,-2.3417,;6.3042,-3.2417,;6.3042,-3.8417,;5.7875,-4.1417,;5.7875,-4.7417,;5.2667,-5.0417,;6.3042,-5.0417,;6.8167,-4.7417,;7.3417,-5.0417,;7.3417,-5.6417,;7.8625,-5.9417,;8.3792,-6.2417,;6.825,-5.9417,;6.3042,-5.6417,)|
I believe that this is what should also end up in the output of pg_dump.
Thank you @greglandrum, am currently precluded from upgrading RDKit for other reasons but hoping to do so soon.
How lossy is the CXSMILES output compared to SDF? Do tags get retained? Another aspect that has an impact on some programs is atom ordering, if explicit non-polar hydrogens and their coordinates are lost, presumably that is also affected too?
Ended up in the meantime going with a solution inspired by @xjalencas' byte column solution, using a Python script to write out any mol column data in pickle format, converting on-the-fly, which had the advantage of maintaining all of the above information while not needing to maintain / sync an extra database column.
Thanks for your help and time.