Galaxy: Handle preview and count of empty lines correctly in datasets

Created on 31 Jul 2016  路  4Comments  路  Source: galaxyproject/galaxy

Hi All,

The Remove beginning of a file tool (galaxy/tools/filters/remove_beginning.xml) isn't removing any lines. I tested this on a current cloud instance and on test with plain text files and tab delimited.

kinbug

Most helpful comment

Nicola, I've figured out what's going on, and it doesn't have anything to do with the Remove beginning of file command. I'm not sure if this is a bug, but it definitely is not a feature.

Take a look at https://usegalaxy.org/u/tnabtaf/h/removing-5-blank-lines. The uploaded Analysis (9).txt and Analysis (10).txt files were exported from the enrichment tool on the Gene Ontology home page. These files have 3 sections:

  • First 5 lines are blank
  • Next 5 lines are plain text. It's basically a header. None of the lines start with any comment indicator
  • Tab delimited data. This includes a header line identifying the columns.

I erroneously assumed that the Remove beginning tool was not removing the lines because of

  1. the way Galaxy's preview function in the history pane counts lines, and
  2. how tab delimited (but not plain text) data is displayed in the middle panel.

Analysis (9) has a datatype of txt, the default guess if you don't specify datatype in the upload form. Analysis (10) has a datatype of tabular. This datatype was manually assigned.

My confusion is because

  1. The previews of both Analysis (9) (txt) and Analysis (10) (tabular) say there are 84 lines in the files, when there are in fact 89 lines,
  2. Viewing the contents of Analysis (10) (tabular) in the center panel does not show the leading 5 blank lines.
  3. The previews of both datasets after removing the first 5 lines (which are blank) of both datasets says there are 84 lines in the files, making it look like nothing has changed.

I suggest:

  1. If there are blank lines in tabular files, display them in the middle panel. Don't just drop them.
  2. Have the line counts shown in the preview include blank lines.

All 4 comments

Mmmh, this works for me both on usegalaxy.org and my local instance and on both txt and tabular datasets. Can you share a history or the affected dataset(s)?

Nicola, I've figured out what's going on, and it doesn't have anything to do with the Remove beginning of file command. I'm not sure if this is a bug, but it definitely is not a feature.

Take a look at https://usegalaxy.org/u/tnabtaf/h/removing-5-blank-lines. The uploaded Analysis (9).txt and Analysis (10).txt files were exported from the enrichment tool on the Gene Ontology home page. These files have 3 sections:

  • First 5 lines are blank
  • Next 5 lines are plain text. It's basically a header. None of the lines start with any comment indicator
  • Tab delimited data. This includes a header line identifying the columns.

I erroneously assumed that the Remove beginning tool was not removing the lines because of

  1. the way Galaxy's preview function in the history pane counts lines, and
  2. how tab delimited (but not plain text) data is displayed in the middle panel.

Analysis (9) has a datatype of txt, the default guess if you don't specify datatype in the upload form. Analysis (10) has a datatype of tabular. This datatype was manually assigned.

My confusion is because

  1. The previews of both Analysis (9) (txt) and Analysis (10) (tabular) say there are 84 lines in the files, when there are in fact 89 lines,
  2. Viewing the contents of Analysis (10) (tabular) in the center panel does not show the leading 5 blank lines.
  3. The previews of both datasets after removing the first 5 lines (which are blank) of both datasets says there are 84 lines in the files, making it look like nothing has changed.

I suggest:

  1. If there are blank lines in tabular files, display them in the middle panel. Don't just drop them.
  2. Have the line counts shown in the preview include blank lines.

I'm submitting a pull request for this issue... hopefully my changes will resolve the problem.

Thank you @sszakony !

Was this page helpful?
0 / 5 - 0 ratings