Scout: Allow for multiple loqusdb

Created on 15 May 2020  路  29Comments  路  Source: Clinical-Genomics/scout

Is it possible to support different loqusdb entities for different institutes?

For example, we currently have:
A institute with hg19 wgs-samples
Another with hg38 wgs-samples
A third with hg38 panel-data for oncogenetics

We'd like to present different data from different loqusdb-versions in the Local Observations in the variant view for the different institutes.

enhancement question Intermediate

Most helpful comment

Seems to be working now. Will try some more when I find the time. Thanks!

All 29 comments

Hi @ViktorHy sounds like a good idea, we are getting closer to a similar situation. One alternative which I find attractive, would be to have separate instances of Scout for different types of data. But anyway, I will have a look if there is a nice way to implement this.

Hi @ViktorHy , we will start working on this now. Do you have any input on how it should work or be implemented?

My 2 cents:

Since loqusdb already has support for loading data into separate databases using the -db flag (which defaults to loqusdb), I think a simple and backwards compatible solution would be to leave the default database in scout as loqusdb and not change anything in how loqusdb is configured in config.py.

Then allow this default to be overridden by setting an optional field named something like loqusdb_db in an institute document, or in a case document (which overrides any institute value). The case value should be set in the yaml file.

This allows both setting an institute specific database and to set it on a case-to-case basis.

Hi,

This is my proposed solution:

  • LOQUSDB_SETTINGS is a dict or a list of dicts.
  • Add an extra field in LOQUSDB_SETTINGS named "institute". If your session's institute matches, configured binary and config will be used. Otherwise default setting will be used.
  • Omitting the institute field will be interpreted as 'default' (making the solution backwards compatible)

[{'binary_path':"loqusdb", 'config_path':"", 'institute':"default"}, {'binary_path':"/bin/anotherloqusb", 'config_path':"--config1", 'institute':"KI"}, {'binary_path':"/bin/yetanother/loqusdb", 'config_path':"--config2", 'institute':"Mayo"}]

What do you think?

I think that is a really nice solution, it will be easy to control it on institute level just by adding another line there. Let's try to have it on institute level for a while and if there is a need to have it on case level we could come up with another solution in the future.

I think it is an bit confusing to have institute-specific settings in config.py, when all other such settings are in the institute collection. But if you need to have multiple installations of loqusdb, maybe it makes sense.

What's the rationale behind having multiple installations of loqusdb rather than multiple databases in the same instance?

It would be nice if this solution would support both strategies. As I understand it, it won't be possible to set a database name?

Also I think the suggested solution makes it almost impossible to support case-specific configuration in a clean way. But maybe that is a contrived use case anyway...

Hi, thanks for feedback. I agree that there is no particular reason to point to multiple installations of loqusdb, however we want to be able to use different mongod processes for different loqus databases (not required, it is possible to have multiple dbs in one process as well).

Another suggestion could be to have:

LOQUSDB_SETTING = [
    {"binary_path": "loqusdb", "config_path": "rare_disease_37.yml", "name": "default"},
    {"binary_path": "loqusdb", "config_path": "rare_disease_38.yml", "name": "rd38"},
    {"binary_path": "loqusdb", "config_path": "melanoma_37.yml", "name": "melanoma37"},
]

rare_disease_37.yml

db_name: "rd37"
host: localhost
port: 27017

rare_disease_38.yml

db_name: "rd38"
host: localhost
port: 27017

rare_disease_37.yml

db_name: "melanoma_37"
host: otherhost
port: 27030

Then we can refer to these from institutes and cases and fall back on "default" if nothing specified.

Also it would be nice (perhaps in the future) If one institute can choose to show frequencies from multiple loqusdb instances

Like @mikaell said he created a PR that adds this feature in https://github.com/Clinical-Genomics/scout/pull/1984 . This is now ready so please have a look @bjhall and @ViktorHy and see if this will solve your situation.

Thanks! Will have a look this week.

So I have just updated to the latest version of scout. I've been trying to get multiple loqusdb instances to work. Am I correct in this assessment?

To get it work I need 2 separate installations of loqusdb with different paths to the binary?

I was hoping I could use the same installation only with different -db of the same installation/binary

Did you update the version or latest master? There was some misunderstanding during the merge of this, could you give @ViktorHy a hand @northwestwitch @mikaell @dnil ?

More @mikaell, I don't have a clear idea of how the multi-loqus works yet.

And version is as is in the repo - there was never anything wrong with it, just some confusion during stage branch testing!

The multiloqus code is in current master, but not yet in any released version that I know of.

@mikaell @northwestwitch we can also have a look in the afternoon - it's going to block testing for the next release if we don't understand how to integrate it on stage.

I think the config files of stage and prod should be changed according to this PR. @mikaell is working on this

  • Bug patch pushed and with @northwestwitch for review

  • Documentation is updated (https://github.com/Clinical-Genomics/scout/pull/2043/files)

OK so I've tried the latest bugfixes in the latest merge of master.

What I still don't understand is in:
scout/server/blueprints/variant/templates/variant/variant_details.html
The data presented on line 125, ie Cases, takes the loqusdb_id from institute settings

However, the data on the above lines does not take into consideration what loqusdb_id it should present. And as such it always presents the default loqusdb whether or not I set it. That is "Nr Obs" "Nr Hom" and "Total nr"

I hope that you understand what I mean.

Good catch @ViktorHy , could you check that @mikaell @northwestwitch ?

Calling LoqusDB is done in controllers.py on this line:

    obs_data = loqusdb.get_variant(variant_query, loqusdb_id=institute_obj.get("loqusdb_id")) or {}

institute_obj.loqusdb_id is read from the MongoDB and is configurable.

However, there might be a misunderstanding. You refer to a line in a html file and setting of the variable loqusdb_id not present in that file. Can you clarify?

The data presented on line 125, ie Cases, takes the loqusdb_id from institute settings
```
{{ data.case.display_name }}

So, I have two loqusdb databases. One has 2 samples the other has 3. The one with 2 samples is my default. I set the loqusdb_id in my institute to the one containing 3. In variant view it shows observations for the default, that is 2. I can get it to match cases from the different loqusdb databases however, but it still only shows observations for the default no matter what loqusdb_id is set for the institute.

loqusdb_multi

The argument "--case-count" was missing in the call to LoqusDB. A patch is on its way.

Seems to be working now. Will try some more when I find the time. Thanks!

Seems to be working now. Will try some more when I find the time. Thanks!

That's great, thanks @ViktorHy!

The solution has been deployed, closing this!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

northwestwitch picture northwestwitch  路  5Comments

hassanfa picture hassanfa  路  3Comments

dnil picture dnil  路  3Comments

andreaswallberg picture andreaswallberg  路  4Comments

1ctw picture 1ctw  路  5Comments