Azure-pipelines-tasks: CopyFilesOverSSH very slow

Created on 24 Jan 2017  路  17Comments  路  Source: microsoft/azure-pipelines-tasks

copying files from build machine to deployment server using pscp works fine.
copying files using VSTS CopyFilesOverSSH is VERY slow (12 minutes for a single 30MB file).

ABTT CopyFilesOverSSH enhancement

Most helpful comment

I'm just wondering, can we just check the support for native scp on the agent and use that? This node-scp2 can be used as a fallback.

All 17 comments

@yuvallb: Please attach the debug log. Do you see the slowness in establishing the connection or the transfer of the file itself. Are you using your own Windows build agent?

Thanks,
Madhuri

The slowness is in the file transfer itself, not in establishing the initial connection.
This is running on a build agent.
Attached the debug log:

2017-01-24T06:49:02.5140622Z Set workingFolder to default: C:\agent\tasks\CopyFilesOverSSH\0.1.2
2017-01-24T06:49:02.7015635Z ##[debug]agent.workFolder=C:\agent\_work
2017-01-24T06:49:02.7015635Z ##[debug]loading inputs and endpoints
2017-01-24T06:49:02.7015635Z ##[debug]loading ENDPOINT_AUTH_0855B4B3-C590-41C6-9CFC-A1827337A692
2017-01-24T06:49:02.7015635Z ##[debug]loading ENDPOINT_AUTH_PARAMETER_0855B4B3-C590-41C6-9CFC-A1827337A692_PASSWORD
2017-01-24T06:49:02.7015635Z ##[debug]loading ENDPOINT_AUTH_PARAMETER_0855B4B3-C590-41C6-9CFC-A1827337A692_USERNAME
2017-01-24T06:49:02.7015635Z ##[debug]loading ENDPOINT_AUTH_PARAMETER_SYSTEMVSSCONNECTION_ACCESSTOKEN
2017-01-24T06:49:02.7015635Z ##[debug]loading ENDPOINT_AUTH_SCHEME_0855B4B3-C590-41C6-9CFC-A1827337A692
2017-01-24T06:49:02.7015635Z ##[debug]loading ENDPOINT_AUTH_SCHEME_SYSTEMVSSCONNECTION
2017-01-24T06:49:02.7015635Z ##[debug]loading ENDPOINT_AUTH_SYSTEMVSSCONNECTION
2017-01-24T06:49:02.7015635Z ##[debug]loading INPUT_CLEANTARGETFOLDER
2017-01-24T06:49:02.7015635Z ##[debug]loading INPUT_CONTENTS
2017-01-24T06:49:02.7015635Z ##[debug]loading INPUT_FLATTENFOLDERS
2017-01-24T06:49:02.7015635Z ##[debug]loading INPUT_OVERWRITE
2017-01-24T06:49:02.7015635Z ##[debug]loading INPUT_SOURCEFOLDER
2017-01-24T06:49:02.7015635Z ##[debug]loading INPUT_SSHENDPOINT
2017-01-24T06:49:02.7015635Z ##[debug]loading INPUT_TARGETFOLDER
2017-01-24T06:49:02.7015635Z ##[debug]loaded 14
2017-01-24T06:49:02.8422152Z ##[debug]check path : C:\agent\tasks\CopyFilesOverSSH\0.1.2\task.json
2017-01-24T06:49:02.8422152Z ##[debug]set resource file to: C:\agent\tasks\CopyFilesOverSSH\0.1.2\task.json
2017-01-24T06:49:02.8422152Z ##[debug]system.culture=en-US
2017-01-24T06:49:02.8422152Z ##[debug]sshEndpoint=0855b4b3-c590-41c6-9cfc-a1827337a692
2017-01-24T06:49:02.8422152Z ##[debug]0855b4b3-c590-41c6-9cfc-a1827337a692 auth param username = ********
2017-01-24T06:49:02.8422152Z ##[debug]0855b4b3-c590-41c6-9cfc-a1827337a692 auth param password = ********
2017-01-24T06:49:02.8422152Z ##[debug]0855b4b3-c590-41c6-9cfc-a1827337a692 data host = 123.45.67.89
2017-01-24T06:49:02.8422152Z ##[debug]0855b4b3-c590-41c6-9cfc-a1827337a692 data port = 22
2017-01-24T06:49:02.8422152Z ##[debug]Using username and password for ssh connection.
2017-01-24T06:49:02.8422152Z ##[debug]contents=**/*.tar.gz
2017-01-24T06:49:02.8422152Z ##[debug]sourceFolder=C:\agent\_work\51d260ba4\BuildDef\drop
2017-01-24T06:49:02.8422152Z ##[debug]check path : C:\agent\_work\51d260ba4\BuildDef\drop
2017-01-24T06:49:02.8422152Z ##[debug]targetFolder=/home/********/package-project/20170124.1
2017-01-24T06:49:02.8422152Z ##[debug]cleanTargetFolder=false
2017-01-24T06:49:02.8422152Z ##[debug]overwrite=true
2017-01-24T06:49:02.8422152Z ##[debug]flattenFolders=false
2017-01-24T06:49:02.8422152Z Setting up SSH connection to remote host 123.45.67.89.
2017-01-24T06:49:05.7484795Z ##[debug]include content pattern: **/*.tar.gz
2017-01-24T06:49:05.7484795Z ##[debug]find findPath=C:\agent\_work\51d260ba4\BuildDef\drop, options=undefined
2017-01-24T06:49:05.7484795Z ##[debug]  C:\agent\_work\51d260ba4\BuildDef\drop
2017-01-24T06:49:05.7484795Z ##[debug]    is a directory
2017-01-24T06:49:05.7484795Z ##[debug]  C:\agent\_work\51d260ba4\BuildDef\drop\package-name_Build-2103.tar.gz
2017-01-24T06:49:05.7484795Z ##[debug]    is a file
2017-01-24T06:49:05.7484795Z ##[debug]2 results
2017-01-24T06:49:05.7484795Z ##[debug]allFiles contains 1 files
2017-01-24T06:49:05.7484795Z ##[debug]Include matching C:\agent\_work\51d260ba4\BuildDef\drop\**\*.tar.gz
2017-01-24T06:49:05.7484795Z ##[debug]match patterns: C:\agent\_work\51d260ba4\BuildDef\drop\**\*.tar.gz
2017-01-24T06:49:05.7484795Z ##[debug]match options: [object Object]
2017-01-24T06:49:05.7484795Z ##[debug]applying pattern: C:\agent\_work\51d260ba4\BuildDef\drop\**\*.tar.gz
2017-01-24T06:49:05.7484795Z ##[debug]matched 1 items
2017-01-24T06:49:05.7484795Z ##[debug]Include matched 1 files
2017-01-24T06:49:05.7484795Z ##[debug]Number of files to copy = 1
2017-01-24T06:49:05.7484795Z ##[debug]filesToCopy = C:\agent\_work\51d260ba4\BuildDef\drop\package-name_Build-2103.tar.gz
2017-01-24T06:49:05.7484795Z Found 1 files to copy to the remote machine.
2017-01-24T06:49:05.7484795Z ##[debug]fileToCopy = C:\agent\_work\51d260ba4\BuildDef\drop\package-name_Build-2103.tar.gz
2017-01-24T06:49:05.7484795Z ##[debug]relativePath = package-name_Build-2103.tar.gz
2017-01-24T06:49:05.7484795Z Copying file C:\agent\_work\51d260ba4\BuildDef\drop\package-name_Build-2103.tar.gz to /home/********/package-project/20170124.1/package-name_Build-2103.tar.gz on remote machine.
2017-01-24T06:49:05.7484795Z ##[debug]Upload C:\agent\_work\51d260ba4\BuildDef\drop\package-name_Build-2103.tar.gz to /home/********/package-project/20170124.1/package-name_Build-2103.tar.gz on remote machine.
2017-01-24T07:01:18.8264918Z Completed copying 1 files to the remote machine.
2017-01-24T07:01:18.8264918Z ##[debug]Closing the client connection
2017-01-24T07:01:18.8264918Z ##[debug]Failed to close SFTP client.

I was experimenting with VSTS today and was excited to see that it had native support for SFTP transfers...but then when I tried it, I soon concluded that this task transferred files so slowly it was basically unusable. I tried uploading files from my own agent via this task and via WinSCP; this task used only ~1Mbps of the agent's available upstream bandwidth, whereas the WinSCP transfer maxed out the agent's upstream bandwidth (~5Mbps). Transfer performance on a hosted agent seemed to be even worse - I had to kill the task before it completed because it was taking so long. There are certainly workarounds (pscp, WinSCP, etc.), but it would be nice to use this built-in task without resorting to one of those.

I replaced the copyOverSsh task with a command line task running pscp, using a preconfigured putty session with a public key authentication. It works perfectly.
I agree it would be nice to have the copyOverSsh task work, as my workaround requires too much configuration and moving parts.

@yuvallb: Sorry for the issues you are facing and thanks for providing the workaround you are using. Our tests with files and directories in the size range you mentioned didn't show performance issues (completed in < minute). Are you using the hosted build VMs or your own windows build servers? What types of files are you transferring?

@danports:
We are using a cross platform native node library to do the transfer. I can imagine tools built specifically for Windows like WinSCP might be faster. We will investigate the best approach to speed up the transfers in the task.

Thanks,
Madhuri

I am using my own windows build server (VM on Azure), and transferring a zip file to a linux server (~35MB).

I noticed that in the debug log, there is a line:
2017-01-24T07:01:18.8264918Z ##[debug]Failed to close SFTP client.

This indicates that the agent is using SFTP, and in comparision @danports uses SCP. In http://www.jscape.com/blog/scp-vs-sftp it mentioned:

  1. Speed
    When comparing SCP vs SFTP in terms of speed, i.e., in transferring files, SCP is generally much faster. This is due to the way it confirms received packets. Traditionally, SFTP has to ACK (acknowledge) every tiny packet, while SCP does not. That's why the disparity becomes more evident in high latency networks.

Advantage: SCP

So my guess that the agent is using SFTP is another reason. It will be informational to see the performance of vsts-tasks agent, pscp (which uses SCP) and psftp (which uses SFTP).

@butangmucat Sorry, I probably should have clarified this earlier. The name "WinSCP" is a bit misleading - I am using WinSCP to transfer files over an SFTP connection. So my comparison was apples-to-apples: SFTP with WinSCP vs. SFTP with the VSTS task.

So I've done a little digging, and I believe the issue here is that the js implementation of scp being used is very slow on large files (100mb+) compared to native scp. The difference is much less noticeable at smaller file sizes.

|Command|File Size|Duration via time|
|---|---|---|
|native scp|4.22mb|4.225s|
|js scp original|4.22mb|6.788s|
|native scp|97.4mb|36s|
|js scp original|97.4mb|2m15s|

So in the scp2 repo, there's an issue from 2014 about speed issues, which @mscdex (author of ssh2, the main library used by scp2) responded to, stating that there is a method sftp.fastPut that uploads in parallel using streams that should be faster. So I cloned scp2, updated dependencies, and swapped the relevant code to use that fastPut method. The current and new code is below with durations. The fastPut version is within a few seconds of native scp, while the current takes about 2 minutes, or 5x the time native scp takes.

// CURRENT WAY
// Duration: **2m6s**
sftp.open(destination, 'w', attrs, function(err, handle) {
  if (err) {
    // destination is directory
    destination = path.join(
      destination, path.basename(options.source)
    );
    destination = unixy(destination);

    // for emit write event
    options.destination = destination;
    sftp.open(destination, 'w', attrs, function(err, handle) {
      _write(handle);
    });
  } else {
    _write(handle);
  }
});

// NEW WAY
// Duration: **43s**
let step = _.throttle(
  (total_transferred, chunk, total) =>
    self.emit(
      "transfer",
      undefined,
      // percentage up to 99
      // since it's incremented in the logger
      Math.min(Math.floor(total_transferred / total * 100), 99),
      100
    ),
  100
);
sftp.fastPut(
  options.source,
  path.basename(options.source),
  {
    step
  },
  err => {
    step.cancel();
    callback(err);
  }
);

I've also hooked in the transfer events to the step callback in fastPut (percentage instead of chunks), so the cli still shows progress. Going to clone the vsts-tasks repo, pull in these library changes, and upload as a custom task to confirm real world performance, but ideally the scp2 repo should be updated and this repo should pull in the updated version.

Update

Created a blank custom task, pulled in the current CopyFilesOverSSH code from master, and replaced scp2 with my changes above. Copying a ~630mb zip file from a centos build agent to a remote server took ~5 minutes, where a native scp copy via the CommandLine task took ~90 seconds, and the unpatched CopyFilesOverSSH takes about half an hour. So this definitely doesn't get up to native speeds on large files, but it's quite a bit faster.

|Task|Time|
|---|---|
|CommandLine (scp)|~90s|
|CopyFilesOverSSH (unpatched)|~30 minutes|
|CopyFilesOverSSH (sftp.fastPut)|~5 minutes|

tl;dr this definitely bumps performance in a significant way in real world builds and releases, especially with larger files

Happy to provide more info if needed.

OS: macOS High Sierra
Node: 8.9.1
NPM: 5.8.0

Does anyone know if CopyFilesOverSSH been updated to use some form of fastPut? We are seeing slowness copying large files like the ones described above.

It has not. To alleviate the speed issues I forked the vsts-tasks repo, made the changes in the above comment, and published it as a new task (CopyOverScp or something like that) to our tfs server.

Sorry but no updates have been made to the task. If you are able to contribute, we greatly appreciate it.
@ansballard : Thanks for sharing your solution, can you point us to the code where you made the modifications?

Sorry for the wait @madhurig, I've been on vacation and it's taken a minute to get everything together.

Sadly my solution at work is behind the company firewall, so no pulling that in. I did, however, take some time to make a nice big PR to the node-scp2 repo that the CopyOverSSH task uses to transfer files. I'd already made the PR to use the fastPut method from above, but I accidentally introduced a bug, so I fixed that, along with updating the repo a bit (see here https://github.com/spmjs/node-scp2/pull/124).

Specifically, the change to use fastPut was added here, and then fixed here.

Hopefully if that can be merged and published to npm the CopyOverSSH task can be updated and the fix wouldn't even need code changes on this end. However, if for some reason my PR isn't able to be merged, I'd be happy to publish a fork (node-scp3-electric-boogaloo?) that could be used instead.

I have the same issue. It takes about 30 minutes for the task to copy a file (about 100MB) to my ec2 instance in AWS.
I'm now using the Shell Script task and call scp manually and now it takes less than 30 seconds!

I'm just wondering, can we just check the support for native scp on the agent and use that? This node-scp2 can be used as a fallback.

We have the same issue. CopyFilesOverSSH is incredibly slow. I'll try the workaround mentioned above - but it should work better out of the box!

Closing this as a duplicate issue. You can track it here: https://github.com/microsoft/azure-pipelines-tasks/issues/13156

Was this page helpful?
0 / 5 - 0 ratings