Paramiko: EOFError when getting a big file (900MB) via SFTP.

Created on 22 Mar 2013  Â·  31Comments  Â·  Source: paramiko/paramiko

I'm trying to use duplicity to backup my server to a CrushFTP server (windows) in SFTP, but it always drop connection when getting some big files.

So I tried to get a problematic file with paramiko directly and got the same error Server connection dropped meaning an EOFError was raise (bytes received == 0). I put a print statement in _read_all method to show how many bytes were received. On one thread it's getting rapidly to 32777 then stay there. Another thread goes up to 7049 then back to 0 and then never more then 200. After a long long time (many minutes) I receive the EOFError.

If I try to get the same file with lftp it works without any problems.

Using paramiko 1.10.0, python 2.6.5 on Ubuntu Server LTS 10.0.4 amd64.

Bug Needs investigation Needs patch Nonstandard platforms SFTP

Most helpful comment

        with paramiko.Transport((server, 22)) as transport:
            # SFTP FIXES
            transport.default_window_size = paramiko.common.MAX_WINDOW_SIZE // 2
            #transport.default_max_packet_size = paramiko.common.MAX_WINDOW_SIZE
            transport.packetizer.REKEY_BYTES = pow(2,
                                                   40)  # 1TB max, this is a security degradation!
            transport.packetizer.REKEY_PACKETS = pow(2,
                                                      40)  # 1TB max, this is a security degradation!
            # / SFTP FIXES

            transport.connect(username=username, password=pw)
            with paramiko.SFTPClient.from_transport(transport) as sftp:
                sftp.get_channel().in_window_size = 2097152
                sftp.get_channel().out_window_size = 2097152
                sftp.get_channel().in_max_packet_size = 2097152
                sftp.get_channel().out_max_packet_size = 2097152
                files = sftp.listdir()
                files = list(filter(lambda x: x.endswith(".zip"), files))
                print(files)

                if len(files) > 2:
                    for f in files:
                        target = str(dst / f)
                        print(f"Downloading {f} to {target}")
                        sftp.get(f, target)

                    for f in files:
                        sftp.remove(f)

This fixes it for me for files > 600MB

(not sure what exactly I did there but it works ¯_(ツ)_/¯)

All 31 comments

I think this might be related to issue #124 in which I'm also accessing a Windows hosted SFTP server (GlobalSCAPE) and downloading pretty large files (hundreds of MBs to tens of GBs). Do you have access to the SFTP server itself as well as its logs? That's the one thing I don't have access to with our current vendor. Otherwise, all of the symptoms you mention above match what we're seeing.

Yes, it's possibly related. There's also, at least, two posts on StackOverflow that are probably related.

About the SFTP server logs, it could be possible for me to get access to it, but not easily. But the admin running the server told me that there's nothing abnormal in the logs?

I'm in a similar situation. We can get access to the logs, but the admin in charge is rather difficult to deal with and will most likely intentionally ignore our tickets. To add insult to injury, while we have managed to get logs from them before, GlobalSCAPE is a multi-protocol solution, so it mangles the logs into an standard "FTP-ish" format before saving them. As a result, any information specific to SFTP is lost. I don't think they'd give me the time of day if I asked for debug logs (if the product even supports them).

We were also told that there's "nothing abnormal" about our logs, but when they actually gave them to us, the logs were indicating that multiple downloaded attempts finished successfully, each after transferring differing numbers of bytes (far smaller than the actual size of the file) for the same file. Additionally, they never recorded the disconnection. If you can get a few chunks of log to scrutinize yourself, it might be more helpful. If you can get protocol specific logs, even better. I'm not expecting a lot from these Windows SFTP server solutions though.

Was any traction made with this? I am hitting the same roadblock with a file over 1GB in size.

Unfortunately not. I've been using Perl + Net::SFTP::Foreign as a replacement for the time being. Mainly because it piggybacks off the openssh binary which has proven to be far more reliable.

Any updates on this? Running into the same error

Not that I'm aware of. I assume the current maintainer mostly uses it for his deployment solution (fabric), so issues with large files on platforms he doesn't (officially?) support anyway don't seem to attract much interest. It's also a pain to reproduce this bug without an appropriate file and server combination, and nobody has come forward to try to troubleshoot it, so I've just avoided using paramiko since it can't be relied on for my purposes.

Any suggestions on other modules I can use instead of paramiko?

for large files ie

@kaorihinata is mostly right, though I do my best to look at things from a "pure" paramiko standpoint (i.e. I won't ignore an issue just because it doesn't impact Fabric, even if Fabric-related issues do get more love). The problem here is much more the difficulty in reproducing & the nonstandard platform :(

Always open to merging patches that users say "this fixes my problem X" and which can be proven to not break eg POSIX platforms, but this ticket's not at that stage yet unless I'm missing something.

To be honest, I'd probably blame the vendor for their loose interpretation of the standard and dubious definition of "production ready" when it comes to code. It may be the case that OpenSSH works with these servers due to workarounds for broken servers. If I come across the issue again, I will try to determine the cause.

@kaorihinata so what do you use currently to transfer files > 1GB? Fabric if so can you point me to a link where I can find sample code to implement a similar get operation using it

@rsheshadri I wasn't required to use Python as long as I had a working solution and since the script was pretty simple, I switched to using Net::SFTP::Foreign with Perl. It uses the OpenSSH client on the backend so compatibility is the best you're going to get.

Running into this issue with a significantly smaller file too. I've tracked it down to the fact that the max. packet_size/window_size are very small after connecting (4096 and 32759).. If I manually override these values to I can get to 2.1 MB before the upload stalls.. This only occurs on a handful of remotes.

The only way of overriding that worked so far was:

        sftp_connection.get_channel().in_window_size = 2097152
        sftp_connection.get_channel().out_window_size = 2097152
        sftp_connection.get_channel().in_max_packet_size = 2097152
        sftp_connection.get_channel().out_max_packet_size = 2097152

@bitprophet would love to work with you on debugging this.

I recently ran in a similar problem when downloading files larger than 100MB from a CrushSFTP server. What happened was that the CrushSFTP server closed the socket as soon as paramiko requested a package beyond the file size. This is actually what happens in SFTPFile.read at the end of the while-loop. Curious enough, this only happens with files larger than 100MB, but I guess this might be some configuration of the CrushSFTP. For smaller files I received EOF when reading beyond the files size (@bitprophet: I don not really understand why paramiko is doing this).

The following changes in the code solved the problem for me:
in SFTPFile.read (with complete_file_size being passed as an extra argument to the method):

    while len(self._rbuffer) < size:
        read_size = size - len(self._rbuffer)
        if self._flags & self.FLAG_BUFFERED:
            read_size = max(self._bufsize, read_size)
        try:
            new_data = self._read(read_size)
        except EOFError:
            new_data = None
        if (new_data is None) or (len(new_data) == 0):
            break
        self._rbuffer += new_data
        self._realpos += len(new_data)
        # NOTE: this is the break condition I added to check if I read the complete file
        if self._realpos >= complete_file_size:
            break

Furthmore, in SFTPClient.getfo an extra break condition to not call again read:

        while True:
            data = fr.read(32768, file_size) # NOTE: passing the filesize
            fl.write(data)
            size += len(data)
            if callback is not None:
                callback(size, file_size)
            # NOTE: I could actually use the callback, but since I was patching the code anyway, the
            # second break to not call again fr.read
            if size == file_size:
                break
            if len(data) == 0:
                break

@horida can you send them as pull requests and i can look into and see if i can figure out why it's misbehaving? (that pull request probably won't be merge, but i know where in the code the issues are)

@lndbrg here is the pull request: https://github.com/paramiko/paramiko/pull/564
It is not really ready to merge (as request by you). But it shows the fix that solved the problem for me.
Don't hesitate to contact me in case of any question.
Thanks for you efforts.

It would be great to see a review of pull request #564 and to create a full fix for this issue.

I am getting this issue communicating with OpenSSH-6.2 server btw, so the "Nonstandard platforms" tag isn't relevant to me.

debug1: Remote protocol version 2.0, remote software version OpenSSH_6.2

Has this been fixed? I'm getting SSHException: Server connection dropped: around ~168MB through the upload to an SFTP.

@rsheshadri did you find a good workaround (using python)?

This is becoming more and more a problem with duplicity and bigger backups. Luckily duplicity has a "legacy" ssh backend, that can be used successfully instead of the paramiko one:
http://duplicity.nongnu.org/duplicity.1.html#sect21

--ssh-backend pexpect

The proposed patch in #564 did not work for me. Talking to a wheezy openssh server.

FYI: I actually ended up implementing a combo of the normal sftp command with pexpect. Have not had issues since.

FYI, I abandoned Paramiko and am sending things successfully using the system scp executable.

Any solution to this problem yet? I am still facing this error when downloading files over 500MB.

I'm unaware of any solution to this. It's also a bit difficult to reproduce with any consistency. It would make it a lot easier to solve if someone was able to reproduce it consistently, and someone else attached to the ticket was able to confirm the method. In my case it was random, so it was difficult to pin down what was happening.

Run into the same problem with 600MB files. Is there any proper fix for this?

        with paramiko.Transport((server, 22)) as transport:
            # SFTP FIXES
            transport.default_window_size = paramiko.common.MAX_WINDOW_SIZE // 2
            #transport.default_max_packet_size = paramiko.common.MAX_WINDOW_SIZE
            transport.packetizer.REKEY_BYTES = pow(2,
                                                   40)  # 1TB max, this is a security degradation!
            transport.packetizer.REKEY_PACKETS = pow(2,
                                                      40)  # 1TB max, this is a security degradation!
            # / SFTP FIXES

            transport.connect(username=username, password=pw)
            with paramiko.SFTPClient.from_transport(transport) as sftp:
                sftp.get_channel().in_window_size = 2097152
                sftp.get_channel().out_window_size = 2097152
                sftp.get_channel().in_max_packet_size = 2097152
                sftp.get_channel().out_max_packet_size = 2097152
                files = sftp.listdir()
                files = list(filter(lambda x: x.endswith(".zip"), files))
                print(files)

                if len(files) > 2:
                    for f in files:
                        target = str(dst / f)
                        print(f"Downloading {f} to {target}")
                        sftp.get(f, target)

                    for f in files:
                        sftp.remove(f)

This fixes it for me for files > 600MB

(not sure what exactly I did there but it works ¯_(ツ)_/¯)

transport = paramiko.Transport((host, port))

transport.default_window_size = paramiko.common.MAX_WINDOW_SIZE

transport.packetizer.REKEY_BYTES = pow(2, 40)
transport.packetizer.REKEY_PACKETS = pow(2, 40)

This fix works for us. Now we're able to download that stupid file of 3.2Gb.

Where does one add this piece? when using pysftp for example which uses paramiko..

From RFC4253:

It is RECOMMENDED that the keys be changed after each gigabyte of transmitted data or after each hour of connection time, whichever comes sooner

It seems that the right fix here is perhaps to adjust (increase) the data and/or time limits for rekeying, but also fix whatever bug is causing the connection to drop when it happens?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

cadedaniel picture cadedaniel  Â·  19Comments

JohnOknelorf picture JohnOknelorf  Â·  15Comments

Noctevent picture Noctevent  Â·  16Comments

ranamalo picture ranamalo  Â·  16Comments

sturmf picture sturmf  Â·  28Comments