Scout: Update description of yaml config file in admin guide

Created on 2 Feb 2021  ยท  17Comments  ยท  Source: Clinical-Genomics/scout

This file: http://www.clinicalgenomics.se/scout/admin-guide/load-config/

There are fields added recently that are not described

enhancement Easy docs QualityOfLife

Most helpful comment

I'll never make a change forcing a major release ๐Ÿ˜€

All 17 comments

I'll give it a shot.

I assume the attributes are somewhat sorted and grouped to functionality. I suggest using alphabetical order instead. For example:

  - analysis_type: wes
    capture_kit: Agilent_SureSelectCRE.V1
    chromograph_images:
        'autozygous': 'scout/demo/images/chromograph_demo/autozygous_regions'
        'coverage': 'scout/demo/images/chromograph_demo/coverage'
        'upd_regions': 'scout/demo/images/chromograph_demo/upd_regions'
        'upd_sites': 'scout/demo/images/chromograph_demo/upd_sites'
    expected_coverage: 30
    father: ADM1059A1
    mother: ADM1059A3
    mt_bam: scout/demo/reduced_mt.bam
    phenotype: affected
    sample_id: ADM1059A2
    sample_name: NA12882
    sex: male
    tissue_type: blood
    vcf2cytosure: scout/demo/ADM1059A2.dummy.cgh

moonso suggests rewriting the configuration and parser with Pydantic. This has been done in CG. I think it might be a good idea.

Sure, but keep in mind that it should still be possible to parse the old file, otherwise it becomes a major and I don't know if it's worth to release a major right now only to add some documentation lines

I'll never make a change forcing a major release ๐Ÿ˜€

If we could time that major around the time of migrating to a new server, it would not be a Bad Thing.

  • Assignee: in use?
  • Cohorts: is missing in documentation but seems to be in use in code
  • delivery_report: missing in docs -type?
  • Collaborators: missing in docs -type?
  • coverage_qc_report: what is this?
  • default_panels called default_gene_panels in docs
  • delivery_report: type?
  • lims_id: ?
  • phenotype_terms: ?
  • smn_tsv: ?
  • synopsis: ?
  • Why do alignment_path, bam_file and bam_path all exist? They will end up pointing to the same internal key.
  • expected_coverage marked as mandatory in documentation but is never parsed. Therefore removed.

To be continued...

Regarding your questions, I assume you are comparing case model with case that gets loaded using a config file right?

  • Assignee: in use? Yes, but after you assign yourself to a case, I'm not sure if it can get parsed from the config file
  • Cohorts: is missing in documentation but seems to be in use in code. It's used
  • delivery_report: missing in docs -type? String
  • Collaborators: missing in docs -type? Array
  • coverage_qc_report: what is this? It's a string
  • default_panels called default_gene_panels in docs ???
  • delivery_report: type? String
  • lims_id: ? String
  • phenotype_terms: ? Array
  • smn_tsv: ? String
  • synopsis: ? This was tricky, I remember that it has to accept both String and a list of strings, but check the code to be sure
  • Why do alignment_path, bam_file and bam_path all exist? They will end up pointing to the same internal key. I know, but for historical reasons we keep it this way, because if we change it people using al old naming like bam_path won't be able to see alignments any more. This can be changed during a major
  • expected_coverage marked as mandatory in documentation but is never parsed. Therefore removed. OK, but check that nothing breaks in the portal downstream
  • . I know, but for historical reasons we keep it this way, because if we change it people using al old naming like bam_path won't be able to see alignments any more. This can be changed during a major

Which name is most correct? Let's mark the other two as deprecated and to be removed in the future.

Which name is most correct? Let's mark the other two as deprecated and to be removed in the future.

Good idea: the latest and most correct is alignment_path

โ€ฆ

  • confirmed_parent: ?
  • confirmed_sex: ?
  • is_sm and is_sma_carrier: what is this, are they different, boolean type?
  • Are mother and father mandatory? No check to enforce this in the code.
  • msi: type? (assume this is optional)
  • mt_bam: ?
  • predicted_ancestry: ?
  • sample_name: seems redundant. Not documented, used to set internal display_name as first hand choice, otherwise sample_id is used.
  • smn1_cn and smn2_cn: ?
  • smn2delta78_cn: ?
  • smn_27134_cn: ?
  • tmb: typed as string, but in comment indicated to be between 0-1000 -make int -enforce range

Marked as mandatory:

  • family: will raise ConfigError โœ…
  • owner: raises SyntaxError -maybe better error raise
  • phenotype: raises PedigreeError โœ…
  • sample_id: raises PedigreeError โœ…
  • sample_name: will not raise an exception -this field is redundant with sample_id. Mark as deprecated and remove?
  • sex: raises PedigreeError. โœ…
  • father/mother -only checked if set, sample id must exist. Documentation should be updated.

Information about a the configuration fields is duplicated in the file. This must be solved.

Example

analysis_date: datetime(optional)


- **analysis_date** time for analysis in datetime format. Defaults to time of uploading

This turns more and more into a small project with subparts:

implement pydantic
correct errors in documentation
find all duplications in code and documentation
raise reasonable exceptions
add constraints where needed
misc errors that appear

Wow that's a big effort! Highly appreciated, since I doubt I would choose it myself. ๐Ÿ˜„

I can't answer all your questions without digging into the code, and to do that it would take me quite some time, so I guess you could as well check yourself into the case parsing stuff.

How did you manage to write the pydantic PR if you have doubts about all these keys?

  • confirmed_parent: I'm not sure about this, you need to check the code
  • confirmed_sex: ? Same here

    • is_sm and is_sma_carrier: what is this, are they different, boolean type? I'm not sure about this, you need to check the code

    • Are mother and father mandatory? No check to enforce this in the code. I'm not sure about this, you need to check the code

    • msi: type? (assume this is optional) I'm not sure about this, you need to check the code

    • mt_bam: Optional, but used A LOT, it's the path to the mitochondrial alignment file

    • predicted_ancestry: ? * Check code *

    • sample_name: seems redundant. Not documented, used to set internal display_name as first hand choice, otherwise sample_id is used. This is used all over the code

    • smn1_cn and smn2_cn: ? I'm not sure about this, you need to check the code

    • smn2delta78_cn: ? I'm not sure about this, you need to check the code

    • smn_27134_cn: ? I'm not sure about this, you need to check the code

    • tmb: typed as string, but in comment indicated to be between 0-1000 -make int -enforce range

How did you manage to write the pydantic PR if you have doubts about all these keys?

The pydantic PR doesn't change anything but the internals of parsing and handling. So all keys and contents are the same. But many keys lack documentation I don't really know what they do or if they ever are used.

I suggest making the documentation overhaul into a project. I can work pretty independently on it and i think the effort is worthwhile.

I created a project for this:
https://github.com/Clinical-Genomics/scout/projects/19

To have everything in the same place.

Just ping me as well if you have any issues with finding the types on e.g. the SMN keys after trying I can check that easily, but its as C says; I would still have to look it up so a first pass is appreciated!

This issue is now merged (#2583). Closing.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

hassanfa picture hassanfa  ยท  3Comments

hassanfa picture hassanfa  ยท  4Comments

dnil picture dnil  ยท  3Comments

keyvanelhami picture keyvanelhami  ยท  5Comments

1ctw picture 1ctw  ยท  5Comments