Scout: Update description of yaml config file in admin guide

Created on 2 Feb 2021 · 17Comments · Source: Clinical-Genomics/scout

This file: http://www.clinicalgenomics.se/scout/admin-guide/load-config/

There are fields added recently that are not described

enhancement Easy docs QualityOfLife

Source

northwestwitch

👍1

Most helpful comment

I'll never make a change forcing a major release 😀

mikaell on 17 Feb 2021

😄2

All 17 comments

I'll give it a shot.

I assume the attributes are somewhat sorted and grouped to functionality. I suggest using alphabetical order instead. For example:

  - analysis_type: wes
    capture_kit: Agilent_SureSelectCRE.V1
    chromograph_images:
        'autozygous': 'scout/demo/images/chromograph_demo/autozygous_regions'
        'coverage': 'scout/demo/images/chromograph_demo/coverage'
        'upd_regions': 'scout/demo/images/chromograph_demo/upd_regions'
        'upd_sites': 'scout/demo/images/chromograph_demo/upd_sites'
    expected_coverage: 30
    father: ADM1059A1
    mother: ADM1059A3
    mt_bam: scout/demo/reduced_mt.bam
    phenotype: affected
    sample_id: ADM1059A2
    sample_name: NA12882
    sex: male
    tissue_type: blood
    vcf2cytosure: scout/demo/ADM1059A2.dummy.cgh

mikaell on 16 Feb 2021

moonso suggests rewriting the configuration and parser with Pydantic. This has been done in CG. I think it might be a good idea.

mikaell on 16 Feb 2021

👍1

Sure, but keep in mind that it should still be possible to parse the old file, otherwise it becomes a major and I don't know if it's worth to release a major right now only to add some documentation lines

northwestwitch on 16 Feb 2021

I'll never make a change forcing a major release 😀

mikaell on 17 Feb 2021

😄2

If we could time that major around the time of migrating to a new server, it would not be a Bad Thing.

dnil on 17 Feb 2021

Assignee: in use?
Cohorts: is missing in documentation but seems to be in use in code
delivery_report: missing in docs -type?
Collaborators: missing in docs -type?
coverage_qc_report: what is this?
default_panels called default_gene_panels in docs
delivery_report: type?
lims_id: ?
phenotype_terms: ?
smn_tsv: ?
synopsis: ?
Why do alignment_path, bam_file and bam_path all exist? They will end up pointing to the same internal key.
expected_coverage marked as mandatory in documentation but is never parsed. Therefore removed.

To be continued...

mikaell on 21 Apr 2021

Regarding your questions, I assume you are comparing case model with case that gets loaded using a config file right?

Assignee: in use? Yes, but after you assign yourself to a case, I'm not sure if it can get parsed from the config file
Cohorts: is missing in documentation but seems to be in use in code. It's used
delivery_report: missing in docs -type? String
Collaborators: missing in docs -type? Array
coverage_qc_report: what is this? It's a string
default_panels called default_gene_panels in docs ???
delivery_report: type? String
lims_id: ? String
phenotype_terms: ? Array
smn_tsv: ? String
synopsis: ? This was tricky, I remember that it has to accept both String and a list of strings, but check the code to be sure
Why do alignment_path, bam_file and bam_path all exist? They will end up pointing to the same internal key. I know, but for historical reasons we keep it this way, because if we change it people using al old naming like bam_path won't be able to see alignments any more. This can be changed during a major
expected_coverage marked as mandatory in documentation but is never parsed. Therefore removed. OK, but check that nothing breaks in the portal downstream

northwestwitch on 21 Apr 2021

. I know, but for historical reasons we keep it this way, because if we change it people using al old naming like bam_path won't be able to see alignments any more. This can be changed during a major

Which name is most correct? Let's mark the other two as deprecated and to be removed in the future.

mikaell on 21 Apr 2021

Which name is most correct? Let's mark the other two as deprecated and to be removed in the future.

Good idea: the latest and most correct is alignment_path

northwestwitch on 21 Apr 2021

…

confirmed_parent: ?
confirmed_sex: ?
is_sm and is_sma_carrier: what is this, are they different, boolean type?
Are mother and father mandatory? No check to enforce this in the code.
msi: type? (assume this is optional)
mt_bam: ?
predicted_ancestry: ?
sample_name: seems redundant. Not documented, used to set internal display_name as first hand choice, otherwise sample_id is used.
smn1_cn and smn2_cn: ?
smn2delta78_cn: ?
smn_27134_cn: ?
tmb: typed as string, but in comment indicated to be between 0-1000 -make int -enforce range

mikaell on 21 Apr 2021

Marked as mandatory:

family: will raise ConfigError ✅
owner: raises SyntaxError -maybe better error raise
phenotype: raises PedigreeError ✅
sample_id: raises PedigreeError ✅
sample_name: will not raise an exception -this field is redundant with sample_id. Mark as deprecated and remove?
sex: raises PedigreeError. ✅
father/mother -only checked if set, sample id must exist. Documentation should be updated.

mikaell on 21 Apr 2021

Information about a the configuration fields is duplicated in the file. This must be solved.

Example

analysis_date: datetime(optional)


- **analysis_date** time for analysis in datetime format. Defaults to time of uploading

mikaell on 21 Apr 2021

This turns more and more into a small project with subparts:

implement pydantic
correct errors in documentation
find all duplications in code and documentation
raise reasonable exceptions
add constraints where needed
misc errors that appear

mikaell on 21 Apr 2021

Wow that's a big effort! Highly appreciated, since I doubt I would choose it myself. 😄

I can't answer all your questions without digging into the code, and to do that it would take me quite some time, so I guess you could as well check yourself into the case parsing stuff.

How did you manage to write the pydantic PR if you have doubts about all these keys?

confirmed_parent: I'm not sure about this, you need to check the code
confirmed_sex: ? Same here
- is_sm and is_sma_carrier: what is this, are they different, boolean type? I'm not sure about this, you need to check the code
- Are mother and father mandatory? No check to enforce this in the code. I'm not sure about this, you need to check the code
- msi: type? (assume this is optional) I'm not sure about this, you need to check the code
- mt_bam: Optional, but used A LOT, it's the path to the mitochondrial alignment file
- predicted_ancestry: ? * Check code *
- sample_name: seems redundant. Not documented, used to set internal display_name as first hand choice, otherwise sample_id is used. This is used all over the code
- smn1_cn and smn2_cn: ? I'm not sure about this, you need to check the code
- smn2delta78_cn: ? I'm not sure about this, you need to check the code
- smn_27134_cn: ? I'm not sure about this, you need to check the code
- tmb: typed as string, but in comment indicated to be between 0-1000 -make int -enforce range

northwestwitch on 22 Apr 2021

How did you manage to write the pydantic PR if you have doubts about all these keys?

The pydantic PR doesn't change anything but the internals of parsing and handling. So all keys and contents are the same. But many keys lack documentation I don't really know what they do or if they ever are used.

~~I suggest making the documentation overhaul into a project. I can work pretty independently on it and i think the effort is worthwhile.~~

I created a project for this:
https://github.com/Clinical-Genomics/scout/projects/19

To have everything in the same place.

mikaell on 26 Apr 2021

Just ping me as well if you have any issues with finding the types on e.g. the SMN keys after trying I can check that easily, but its as C says; I would still have to look it up so a first pass is appreciated!

dnil on 27 Apr 2021

This issue is now merged (#2583). Closing.

mikaell on 21 May 2021

Was this page helpful?

0 / 5 - 0 ratings

Related issues

"in COSMIC and in Clinvar" filter in variant list view.

hassanfa · 3Comments

Scout IGV bug, "Unparsable bed record: chr14 23299091"

hassanfa · 4Comments

Comments return to anchor?

dnil · 3Comments

"Symbol mis-match" when uploading new gene panel to Scout

keyvanelhami · 5Comments

Problems after update to 4.1.0

1ctw · 5Comments