This file: http://www.clinicalgenomics.se/scout/admin-guide/load-config/
There are fields added recently that are not described
I'll give it a shot.
I assume the attributes are somewhat sorted and grouped to functionality. I suggest using alphabetical order instead. For example:
- analysis_type: wes
capture_kit: Agilent_SureSelectCRE.V1
chromograph_images:
'autozygous': 'scout/demo/images/chromograph_demo/autozygous_regions'
'coverage': 'scout/demo/images/chromograph_demo/coverage'
'upd_regions': 'scout/demo/images/chromograph_demo/upd_regions'
'upd_sites': 'scout/demo/images/chromograph_demo/upd_sites'
expected_coverage: 30
father: ADM1059A1
mother: ADM1059A3
mt_bam: scout/demo/reduced_mt.bam
phenotype: affected
sample_id: ADM1059A2
sample_name: NA12882
sex: male
tissue_type: blood
vcf2cytosure: scout/demo/ADM1059A2.dummy.cgh
moonso suggests rewriting the configuration and parser with Pydantic. This has been done in CG. I think it might be a good idea.
Sure, but keep in mind that it should still be possible to parse the old file, otherwise it becomes a major and I don't know if it's worth to release a major right now only to add some documentation lines
I'll never make a change forcing a major release ๐
If we could time that major around the time of migrating to a new server, it would not be a Bad Thing.
Assignee: in use?Cohorts: is missing in documentation but seems to be in use in codedelivery_report: missing in docs -type?Collaborators: missing in docs -type?coverage_qc_report: what is this?default_panels called default_gene_panels in docsdelivery_report: type?lims_id: ?phenotype_terms: ?smn_tsv: ?synopsis: ?alignment_path, bam_file and bam_path all exist? They will end up pointing to the same internal key.expected_coverage marked as mandatory in documentation but is never parsed. Therefore removed.To be continued...
Regarding your questions, I assume you are comparing case model with case that gets loaded using a config file right?
Assignee: in use? Yes, but after you assign yourself to a case, I'm not sure if it can get parsed from the config fileCohorts: is missing in documentation but seems to be in use in code. It's useddelivery_report: missing in docs -type? StringCollaborators: missing in docs -type? Arraycoverage_qc_report: what is this? It's a stringdefault_panels called default_gene_panels in docs ???delivery_report: type? Stringlims_id: ? Stringphenotype_terms: ? Arraysmn_tsv: ? Stringsynopsis: ? This was tricky, I remember that it has to accept both String and a list of strings, but check the code to be surealignment_path, bam_file and bam_path all exist? They will end up pointing to the same internal key. I know, but for historical reasons we keep it this way, because if we change it people using al old naming like bam_path won't be able to see alignments any more. This can be changed during a majorexpected_coverage marked as mandatory in documentation but is never parsed. Therefore removed. OK, but check that nothing breaks in the portal downstream
- . I know, but for historical reasons we keep it this way, because if we change it people using al old naming like bam_path won't be able to see alignments any more. This can be changed during a major
Which name is most correct? Let's mark the other two as deprecated and to be removed in the future.
Which name is most correct? Let's mark the other two as deprecated and to be removed in the future.
Good idea: the latest and most correct is alignment_path
โฆ
confirmed_parent: ?confirmed_sex: ?is_sm and is_sma_carrier: what is this, are they different, boolean type?mother and father mandatory? No check to enforce this in the code.msi: type? (assume this is optional)mt_bam: ?predicted_ancestry: ?sample_name: seems redundant. Not documented, used to set internal display_name as first hand choice, otherwise sample_id is used.smn1_cn and smn2_cn: ?smn2delta78_cn: ?smn_27134_cn: ?tmb: typed as string, but in comment indicated to be between 0-1000 -make int -enforce rangeMarked as mandatory:
family: will raise ConfigError โ
owner: raises SyntaxError -maybe better error raise phenotype: raises PedigreeError โ
sample_id: raises PedigreeError โ
sample_name: will not raise an exception -this field is redundant with sample_id. Mark as deprecated and remove?sex: raises PedigreeError. โ
father/mother -only checked if set, sample id must exist. Documentation should be updated.Information about a the configuration fields is duplicated in the file. This must be solved.
Example
analysis_date: datetime(optional)
- **analysis_date** time for analysis in datetime format. Defaults to time of uploading
This turns more and more into a small project with subparts:
implement pydantic
correct errors in documentation
find all duplications in code and documentation
raise reasonable exceptions
add constraints where needed
misc errors that appear
Wow that's a big effort! Highly appreciated, since I doubt I would choose it myself. ๐
I can't answer all your questions without digging into the code, and to do that it would take me quite some time, so I guess you could as well check yourself into the case parsing stuff.
How did you manage to write the pydantic PR if you have doubts about all these keys?
confirmed_parent: I'm not sure about this, you need to check the codeconfirmed_sex: ? Same hereis_sm and is_sma_carrier: what is this, are they different, boolean type? I'm not sure about this, you need to check the codemother and father mandatory? No check to enforce this in the code. I'm not sure about this, you need to check the codemsi: type? (assume this is optional) I'm not sure about this, you need to check the codemt_bam: Optional, but used A LOT, it's the path to the mitochondrial alignment filepredicted_ancestry: ? * Check code *sample_name: seems redundant. Not documented, used to set internal display_name as first hand choice, otherwise sample_id is used. This is used all over the codesmn1_cn and smn2_cn: ? I'm not sure about this, you need to check the codesmn2delta78_cn: ? I'm not sure about this, you need to check the codesmn_27134_cn: ? I'm not sure about this, you need to check the codetmb: typed as string, but in comment indicated to be between 0-1000 -make int -enforce rangeHow did you manage to write the pydantic PR if you have doubts about all these keys?
The pydantic PR doesn't change anything but the internals of parsing and handling. So all keys and contents are the same. But many keys lack documentation I don't really know what they do or if they ever are used.
I suggest making the documentation overhaul into a project. I can work pretty independently on it and i think the effort is worthwhile.
I created a project for this:
https://github.com/Clinical-Genomics/scout/projects/19
To have everything in the same place.
Just ping me as well if you have any issues with finding the types on e.g. the SMN keys after trying I can check that easily, but its as C says; I would still have to look it up so a first pass is appreciated!
This issue is now merged (#2583). Closing.
Most helpful comment
I'll never make a change forcing a major release ๐