Galaxy: NCBI SRA Tools have a new dependency issue at http://usegalaxy.org

Created on 23 Jun 2016  路  33Comments  路  Source: galaxyproject/galaxy

Example of errors are in this history: https://usegalaxy.org/u/jen/h/srr3020343-httpsbiostarusegalaxyorgp18195

Reported at https://biostar.usegalaxy.org/p/18195

Status Was a problem, later resolved, now an issue again.

Alternatives:

  1. Use the tool Get Data > EBI SRA. Many datasets are in both locations.
  2. Download the data directly from NCBI using their protocols, the use the upload tool with FTP to get the data into Galaxy. https://wiki.galaxyproject.org/Support#Loading_data
aretools

All 33 comments

I tested it and the older version 1.2.5 seems to work well, can you please test?

@martenson Any progress updates for this one? Bug reports for this tool suite are still coming in (for http://usegalaxy.org). Thanks!

Any updates? I found the same problems.

@daytekjia This is still being looked it, I believe by @davebx Updates will be added/linked to this ticket. Thanks!

i have problem when Extract reads in SAM or BAM format from NCBI SRA.

@esraaa These tools are known to be problematic and the team is working on it.

Work-arounds include:

  • Download the data from NCBI using their tools to your computer, then load into Galaxy (using FTP & Upload tool)
  • Sometimes an external URL link can be used directly with the Upload tool
  • Other times the data is available from other sources that have a Galaxy tool, example: Get Data: EBI SRA
  • Please note that there is a 50 GB upload limit for most data types. For BAM datasets, this is closer to 30-35 GB.

Help: https://wiki.galaxyproject.org/Support#Loading_data

Github is not the ideal place for usage questions/reporting problems/known problems. Next time please use Galaxy Biostars. A link to ticket issues like this one can be included for reference.

Update: Tools still fails at http://usegalaxy.org. Several users reporting the issue. This is the error message from the tool Extract reads (others give similar errors). Original testing history above still a valid place to see details/find test cases.

Fatal error: Exit code 127 (Could not locate fastq-dump binary)
/galaxy-repl/main/jobdir/013/817/13817965/tool_script.sh: line 9: vdb-config: command not found
/galaxy-repl/main/jobdir/013/817/13817965/tool_script.sh: line 9: prefetch: command not found

Sorry, I didn't see this issue - I applied a fix for this on the 21st. I tried out the history in the original issue and reruns of jobs 3 and 4 are still running (it looks like the original attempts died in a matter of seconds) so at least the dependencies should be resolving now.

Fixing this was done by hacking the dependency directly since it refused to install from the tool shed (due to configure compile-time errors finding SRA dependencies, and because the toolkit uses a Perl makefile I didn't spend much time on it). For transparency's sake and future reference, here are my notes on how it's been installed:

This was manually installed from the _BINARY_ version of sra-toolkit by changing:

/cvmfs/main.galaxyproject.org/shed_tools/toolshed.g2.bx.psu.edu/repos/iuc/package_sra_toolkit_2_6_2/98414d1f9480/package_sra_toolkit_2_6_2/tool_dependencies.xml

I wrapped the sra_toolkit actions in an <action_group> and added the following actions:

<actions architecture="x86_64" os="linux">
    <action type="download_by_url">http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/2.7.0/sratoolkit.2.7.0-centos_linux64.tar.gz</action>
    <action type="move_directory_files">
        <source_directory>.</source_directory>
        <destination_directory>$INSTALL_DIR</destination_directory>
    </action>
    <action type="set_environment">
        <environment_variable action="prepend_to" name="PATH">$INSTALL_DIR/bin</environment_variable>
    </action>
</actions>

@natefoo Thanks Nate! Awesome to get this going again. I'll also re-run some tests, feedback when they complete, but can also check the original history > d11-15 are the reruns

@natefoo There are errors - look like timeouts on the individual file components. Not sure if our side or theirs or if some tuning would help.

Retrying the sam job I got this:

2016-09-28T18:43:07 prefetch.2.7 int: file unauthorized while opening file within network system module - http://sra-download.ncbi.nlm.nih.gov/srapub/SRR3020343: cannot open remote file
2016-09-28T18:47:11 prefetch.2.7 err: process failed while waiting process - ascp failed with 1
2016-09-28T18:47:12 prefetch.2.7 err: process failed while waiting process - ascp failed with 1
2016-09-28T18:47:15 prefetch.2.7 int: file unauthorized while opening file within network system module - failed to open file for http://sra-download.ncbi.nlm.nih.gov/srapub/SRR3020343
2016-09-28T18:47:15 prefetch.2.7 int: self NULL while reading file within file system module - http://sra-download.ncbi.nlm.nih.gov/srapub/SRR3020343: Cannot KFileRead

I don't know enough about how SRA works to know whether this is a problem on our side or not though. The error suggests the remote dataset path is wrong, but honestly I don't know. @blankenberg do you know how it works?

If it helps, fasta + fastq fetching were Ok. My errors for mpileup and/or BAM/SAM fetching look like:

2016-09-27T23:17:04 prefetch.2.7 err: process failed while waiting process - ascp failed with 1
2016-09-27T23:17:05 prefetch.2.7 err: process failed while waiting process - ascp failed with 1
2016-09-27T23:36:34 prefetch.2.7 err: process failed while waiting process - ascp failed with 1
2016-09-27T23:36:35 prefetch.2.7 err: process failed while waiting process - ascp failed with 1

and
2016-09-27T23:17:00 prefetch.2.7: 1) Downloading 'SRR925743'...
2016-09-27T23:17:02 prefetch.2.7: Downloading via fasp...
2016-09-27T23:17:05 prefetch.2.7: fasp download failed
2016-09-27T23:17:05 prefetch.2.7: Downloading via http...
2016-09-27T23:36:20 prefetch.2.7: 1) 'SRR925743' was downloaded successfully
2016-09-27T23:36:33 prefetch.2.7: 'SRR925743' has 93 unresolved dependencies
2016-09-27T23:36:33 prefetch.2.7: 2) Downloading 'ncbi-acc:CM000663.1?vdb-ctx=refseq'...
2016-09-27T23:36:33 prefetch.2.7: Downloading via fasp...
2016-09-27T23:36:35 prefetch.2.7: fasp download failed
2016-09-27T23:36:35 prefetch.2.7: Downloading via http...
2016-09-27T23:36:42 prefetch.2.7: 2) 'ncbi-acc:CM000663.1?vdb-ctx=refseq' was downloaded successfully
2016-09-27T23:36:42 prefetch.2.7: 3) Downloading 'ncbi-acc:CM000664.1?vdb-ctx=refseq'...
2016-09-27T23:36:42 prefetch.2.7: Downloading via fasp...
2016-09-27T23:36:44 prefetch.2.7: fasp download failed

Details in the shared history

fwiiw: SRR3020343 as sam worked on a dev instance for me just fine. The final sam file is 22.6 GB.

2016-09-29T20:32:06 prefetch.2.5.2: 1) Downloading 'SRR3020343'...
2016-09-29T20:32:06 prefetch.2.5.2:  Downloading via fasp...
SRR3020343 1,073,217,536/5,964,823,375 1023M/5G 17%        
SRR3020343 2,111,242,240/5,964,823,375 1.966G/5G 35%      
SRR3020343 3,252,158,464/5,964,823,375 3.028G/5G 54%      
SRR3020343 4,990,107,648/5,964,823,375 4.647G/5G 83%      
2016-09-29T20:36:48 prefetch.2.5.2:  fasp download succeed
2016-09-29T20:36:48 prefetch.2.5.2: 1) 'SRR3020343' was downloaded successfully
2016-09-29T20:36:48 prefetch.2.5.2: 'SRR3020343' has 0 unresolved dependencies

I think @jennaj's log is not necessarily indicating a failure.
The prefetch command cycles through ascp/fasp (which needs the "evil" ascp binary and specific firewall settings (https://www.ncbi.nlm.nih.gov/books/NBK242625/#)) and falls back to http if fasp doesn't work. So that part of the log seems to be OK.

Googling the error message in @natefoo's log leads me here and here and the error message in the code is defined here
Don't really get C, but looks like either disk full or connection error (no idea on which end, of course).

Ahah, prefetch -h says

  -X|--max-size <size>             maximum file size to download in KB
                                   (exclusive). Default: 20G

so we may want to increase that ... to 200G ? higher?

@mvdbeek looks like a good lead!

Mo mentioned on IRC that SRA was having network problems at the time we were testing. The 20GB limit does seem problematic though.

@natefoo agree with limit. I'll run the tests again right now anyway to see what happens

@natefoo Still failed, same error :(

Is this version 2.7.0 that fails?

Yes, it's 2.7.0.

I meant the wrapper version, since that's the only one that has https://github.com/galaxyproject/tools-iuc/pull/972. I can only find 2.6.2 on usegalaxy.org

@mvdbeek It looks like the 2.7.0 versions have not been uploaded to the Tool Shed.

@natefoo I've uploaded the new versions on TTS and MTS now.

I updated Main to 2.7.0 and it still fails, however the results were interesting:

stderr:

2016-10-20T18:45:39 prefetch.2.7 err: process failed while waiting process - ascp failed with 1
2016-10-20T18:45:40 prefetch.2.7 err: process failed while waiting process - ascp failed with 1

stdout:

discarding /galaxy/main/deps/_conda/bin from PATH
prepending /galaxy-repl/main/jobdir/014/133/14133979/conda-env/bin to PATH
Fixed default configuration

2016-10-20T18:45:38 prefetch.2.7: 1) Downloading 'SRR3020343'...
2016-10-20T18:45:38 prefetch.2.7:  Downloading via fasp...
2016-10-20T18:45:40 prefetch.2.7:  fasp download failed
2016-10-20T18:45:40 prefetch.2.7:  Downloading via http...
2016-10-20T19:37:26 prefetch.2.7: 1) 'SRR3020343' was downloaded successfully
2016-10-20T19:37:26 prefetch.2.7: 'SRR3020343' has 0 unresolved dependencies

Clicking the "eye" shows SAM-formatted data. I suspect the only problem here is that the ascp messages need to be suppressed, or the tool should use the exit code for determining failure.

How weird though that there is no problem when testing with travis. Is there actually an ascp binary installed on main? I will change the tool to use exit code detectionand hope that this fixes it.

No, I hadn't installed ascp on Main. I did this and also updated the tools. Testing now...

@natefoo @galaxyproject/guac I will test too, feedback when done, should be quick

I missed copying the key which was causing my test jobs to hang on:

2016-10-25T18:06:02 prefetch.2.7:  Downloading via fasp...
LINE = (16) 'Key passphrase: '

Added the key and testing again.

@natefoo I just started my test .. so was after what you did. results will be here, using the example SRR https://usegalaxy.org/u/jen/h/test-ncbi-sra-tools

Extract reads finally runs successfully for me, and I verified that it used ascp. Thanks for all the help @mvdbeek!

Fixed!! Closing out

Was this page helpful?
0 / 5 - 0 ratings

Related issues

katbeaulieu picture katbeaulieu  路  3Comments

mvdbeek picture mvdbeek  路  3Comments

jmchilton picture jmchilton  路  4Comments

scholtalbers picture scholtalbers  路  5Comments

afgane picture afgane  路  4Comments