Galaxy Tool ID: bed2gff1
Galaxy Tool Version: 2.0.0
Headers added are:
##gff-version 2
##bed_to_gff_converter.py
Problem: Any tool that uses GFF/GTF input, resulting from a data format conversion using this utility, that does not "grep -v" the headers out fails for odd formatting reasons that users do not understand.
Tools that use implicitly converted BED-to-GFF can fail, same reason.
This is packaged into a tool panel "tool" converter, an edit attributes "covert" converter function, and an implicit data converter.
Suggested solution: It would be better to not add in a header and keep the file in the most conservative format, so the result will work with any tool/function. Headers in GFF/GFT files are a very common cause of end-user reported problems. We cannot control what is published by public data sources, but we can help to format data already inside of (or produced by) Galaxy in a way that is more likely to work successfully with our tools.
WORKAROUND for End-Users
BED-to-GFF or the Edit Attributes (pencil icon) "Convert" function. Once in GFF format, run the tool Select with the option "NOT Matching" and regular expression ^#. This will remove all header lines. Select tool as described above.Example of an error (IUC tool given bed input, implicitly converted to GFF) There are many others (most Bioconductor tools, etc).
Dataset Error
An error occured while running the tool toolshed.g2.bx.psu.edu/repos/iuc/cwpair2/cwpair2/1.0.0.
Tool execution generated the following messages:
Unable to parse file "/galaxy-repl/main/files/031/018/dataset_31018984.dat".
Traceback (most recent call last):
File "/cvmfs/main.galaxyproject.org/shed_tools/toolshed.g2.bx.psu.edu/repos/iuc/cwpair2/d4db13c9dd7f/cwpair2/cwpair2_util.py", line 280, in perform_process
chromosomes = parse_chromosomes(input)
File "/cvmfs/main.galaxyproject.org/shed_tools/toolshed.g2.bx.psu.edu/repos/iuc/cwpair2/d4db13c9dd7f/cwpair2/cwpair2_util.py", line 96, in parse_chromosomes
cname, junk, junk, start, end, value, strand, junk, junk = line
ValueError: need more than 1 value to unpack
@martenson You may want to update the cwpair2 tool on UseGalaxy.org to resolve this.
@nsoranzo We are using github project for the tool lifecycle flow now, I added it: https://github.com/galaxyproject/usegalaxy-playbook/projects/3
I think this should stay open until the PR is decided on.
https://github.com/galaxyproject/galaxy/pull/7808
Three issues, this ticket is for 2 below
1 - Great that fixed one tool to deal with the GTF headers (done)
2 - The convert tool does not need to add headers to a GTF -- makes it out of spec -- we can fix this (what PR does)
3 - More tools could ignore GTF headers -- out of spec but common in public data. ALL tools that parse GFF/GTF data would need to be reviewed -- most fail with headers. NOT to be confused with GFF3 which need a header.
2 - The convert tool does not need to add headers to a GTF -- makes it out of spec -- we can fix this (what PR does)
Beware that the converter is from bed to gff (i.e. GFFv2), not GTF.
That should be Ok. There is no way to make an in-spec GTF from any bed dataset with the given content (no gene_id/transcipt_id). GFF should be enough for the tools using the implicit conversion. Or, at least it won't be any worse than it was before && I'll watch for usage reported issues.
Appreciate help with this. A small step towards standardizing these formats, but a good one and worth it imho. Can tackle public GFF/GFTs with headers another way (am making a ticket with idea @jmchilton and I came up with today). I'll ping everyone so we can discuss and get the best solution.
Most helpful comment
@nsoranzo We are using github project for the tool lifecycle flow now, I added it: https://github.com/galaxyproject/usegalaxy-playbook/projects/3