Conan: appveyor & azure pipelines: error 10054 An existing connection was forcibly closed by the remote host

Created on 25 Jun 2019  路  6Comments  路  Source: conan-io/conan

this issue happens only in Azure Cloud, right now in:

  • appveyor, Visual Studio 2019 image
  • azure-pipelines, windows-2019 and vs2017-win2016 images

example: https://ci.appveyor.com/project/ConanCIintegration/conan-boost/builds/25491683/job/bxj5bk8p8bdyadga
stack trace:

Traceback (most recent call last):
  File "build.py", line 33, in <module>
    builder.run()
  File "C:\Python27\lib\site-packages\cpt\packager.py", line 491, in run
    self.run_builds(base_profile_name=base_profile_name)
  File "C:\Python27\lib\site-packages\cpt\packager.py", line 574, in run_builds
    r.run()
  File "C:\Python27\lib\site-packages\cpt\runner.py", line 133, in run
    self._upload, package_id)
  File "C:\Python27\lib\site-packages\cpt\uploader.py", line 21, in upload_packages
    self._upload_artifacts(reference, upload, package_id)
  File "C:\Python27\lib\site-packages\cpt\uploader.py", line 36, in _upload_artifacts
    self.auth_manager.login(remote_name)
  File "C:\Python27\lib\site-packages\cpt\auth.py", line 106, in login
    self._conan_api.authenticate(user, password, remote_name)
  File "C:\Python27\lib\site-packages\conans\client\conan_api.py", line 76, in wrapper
    return f(*args, **kwargs)
  File "C:\Python27\lib\site-packages\conans\client\conan_api.py", line 787, in authenticate
    _, remote_name, prev_user, user = self._remote_manager.authenticate(remote, name, password)
  File "C:\Python27\lib\site-packages\conans\client\remote_manager.py", line 202, in authenticate
    return self._call_remote(remote, 'authenticate', name, password)
  File "C:\Python27\lib\site-packages\conans\client\remote_manager.py", line 242, in _call_remote
    % (str(exc), remote.name, remote.url))
conans.errors.ConanConnectionError: ('Connection aborted.', error(10054, 'An existing connection was forcibly closed by the remote host'))

original stack-trace:

Traceback (most recent call last):
  File "C:\Python37\lib\site-packages\requests\adapters.py", line 449, in send
    timeout=timeout
  File "C:\Python37\lib\site-packages\urllib3\connectionpool.py", line 641, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "C:\Python37\lib\site-packages\urllib3\util\retry.py", line 368, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "C:\Python37\lib\site-packages\urllib3\packages\six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "C:\Python37\lib\site-packages\urllib3\connectionpool.py", line 603, in urlopen
    chunked=chunked)
  File "C:\Python37\lib\site-packages\urllib3\connectionpool.py", line 387, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "C:\Python37\lib\site-packages\urllib3\connectionpool.py", line 383, in _make_request
    httplib_response = conn.getresponse()
  File "C:\Python37\lib\http\client.py", line 1321, in getresponse
    response.begin()
  File "C:\Python37\lib\http\client.py", line 296, in begin
    version, status, reason = self._read_status()
  File "C:\Python37\lib\http\client.py", line 257, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "C:\Python37\lib\socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "C:\Python37\lib\ssl.py", line 1052, in recv_into
    return self.read(nbytes, buffer)
  File "C:\Python37\lib\ssl.py", line 911, in read
    return self._sslobj.read(len, buffer)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

some context:

  1. issue happens for long builds, which take more than 5 minutes (e.g. OpenSSL, boost). fast builds like zlib or bzip2 don't fail.
  2. actual build context doesn't matter, only timing matters - e.g. you may just comment source/builld/etc methods and just put sleeps instead of them.
  3. problem is that conan/CPT creates TCP session (requests.Session) at the beginning of build, and then doesn't use it for 5 minutes or more while project is being built, and once build finishes after 5 minutes, conan/CPT tries to communicate to bintray via existing TCP session, but connection is forcibly closed by remote host by this time.
  4. problem doesn't happen locally - it turns out to be Azure Load Balancer who is responsible for connection termination, not bintray server. Azure Load Balancer uses sort of NAT and DOS protection (information is below).
  5. python version doesn't matter - all python versions have same problem, also requests version is irrelevant and pyOpenSSL version as well
  6. OS version where it fails (OSVERSIONINFO struct): 10 0 14393 2 0 0 400 3 0 and 10 0 17763 2 0 0 400 3 0

issue could be reproduced even without conan's usage, just by using requests or pure socket interface via python (with pure TCP connection without SSL). I believe it's the same with C API, but I didn't try it. the minimal python script which reproduces an issue (two ways: requests and socket):

from __future__ import print_function
import requests
from requests.auth import HTTPBasicAuth
import os
import time
import sys
import ssl
import platform
import ctypes
import socket

print("sys version", sys.version_info)
print("platform version", platform.python_version())
print("SSL version", ssl.OPENSSL_VERSION)


class _OSVERSIONINFOEXW(ctypes.Structure):
    _fields_ = [('dwOSVersionInfoSize', ctypes.c_ulong),
                ('dwMajorVersion', ctypes.c_ulong),
                ('dwMinorVersion', ctypes.c_ulong),
                ('dwBuildNumber', ctypes.c_ulong),
                ('dwPlatformId', ctypes.c_ulong),
                ('szCSDVersion', ctypes.c_wchar * 128),
                ('wServicePackMajor', ctypes.c_ushort),
                ('wServicePackMinor', ctypes.c_ushort),
                ('wSuiteMask', ctypes.c_ushort),
                ('wProductType', ctypes.c_byte),
                ('wReserved', ctypes.c_byte)]

os_version = _OSVERSIONINFOEXW()
os_version.dwOSVersionInfoSize = ctypes.sizeof(os_version)
if hasattr(ctypes, "windll"):
    retcode = ctypes.windll.Ntdll.RtlGetVersion(ctypes.byref(os_version))

    print(os_version.dwMajorVersion,
          os_version.dwMinorVersion,
          os_version.dwBuildNumber,
          os_version.dwPlatformId,
          os_version.szCSDVersion,
          os_version.wServicePackMajor,
          os_version.wServicePackMinor,
          os_version.wSuiteMask,
          os_version.wProductType,
          os_version.wReserved)

def do_sleep(count):
    for i in range(0, count):
        time.sleep(1)
        sys.stdout.write(".")
        sys.stdout.flush()

def make():

    #url = "https://api.bintray.com/conan/conan-community/conan/v1/users/authenticate"
    #url = "https://google.com"
    url = "https://bintray.com"

    headers = {
        'X-Client-Anonymous-Id': '43eaa1e46ddfbeeb50abdcee05c580e260df073f',
        'X-Client-Id': 'conanbot',
        'User-Agent': 'Conan/1.17.0-dev (Python 3.7.3) python-requests/2.22.0'
        }
    auth = HTTPBasicAuth('conanbot', os.environ["CONAN_PASSWORD"])
    timeout = 60

    r = s.get(url, headers=headers, timeout=timeout, auth=auth)
    print(r.status_code)

hostname = 'bintray.com'
port = 80

request = b"GET / HTTP/1.1\nHost: %s\n\n" % hostname.encode()
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

sock.connect((hostname, port))

def make2():
    sock.send(request)
    result = sock.recv(10000)
    print(result)

make()
do_sleep(310)
make()

information about Azure Load Balancer:

  1. https://docs.microsoft.com/en-gb/azure/load-balancer/load-balancer-tcp-reset
  2. https://azure.microsoft.com/en-gb/blog/new-configurable-idle-timeout-for-azure-load-balancer/
  3. https://github.com/uglide/azure-content/blob/master/includes/guidance-tcp-session-timeout-include.md

most important part:

inbound through the Azure load balancer. This timeout defaults to 4 minutes, and can be adjusted up to 30 minutes.

To ensure connections are not lost beyond the timeout limit, you should make sure either your application keeps the session alive, or you can configure the underlying operating system to do so. The settings to be used are different for Linux and Windows systems, as shown below.

For Linux, you should change the kernel variables below. net.ipv4.tcp_keepalive_time = 120 net.ipv4.tcp_keepalive_intvl = 30 net.ipv4.tcp_keepalive_probes = 8

For Windows, you should change the registry values below. KeepAliveInterval = 30 KeepAliveTime = 120 TcpMaxDataRetransmissions = 8

corresponding application setting is SIO_KEEPALIVE_VALS
alternatively, the following options together also work:

  1. TCP_KEEPIDLE
  2. TCP_KEEPINTVL
  3. TCP_KEEPCNT

but they are available since Windows 10 1703 and Windows 10 1709
also, they aren't available in all python versions: https://bugs.python.org/issue32394

corresponding python code to set it:

socket.ioctl(socket.SIO_KEEPALIVE_VALS, (1, 120 * 1000, 30 * 1000))

some general information on these parameters: https://blogs.technet.microsoft.com/nettracer/2010/06/03/things-that-you-may-want-to-know-about-tcp-keepalives/

NOTE: the following socket options are helpless (already checked them):

  1. SO_KEEPALIVE - it just enables/disables TCP keep-alive, but doesn't provide fine-tuning for its timeouts. python's requests module already has this automatically enabled, but system default values aren't suitable for Azure Load Balancer.
  2. SO_RCVTIMEO & SO_SNDTIMEO - and corresponding timeout parameter of requests module - it's unrelated timeout, for blocking recv / send calls, but it has nothing to do for TCP keep-alive timeouts (when you need to keep connection alive without doing any network activity).

TL:DR (summary):
it's an application's responsibility to keep the connection alive. as application is conan in our case (or CPT which uses conan API), conan should configure python requests module to set SIO_KEEPALIVE_VALS.
alternative: try to set these values system-wide on CI, need to be checked if it's possible at all

To help us debug your issue please explain:

  • [ ] I've read the CONTRIBUTING guide.
  • [ ] I've specified the Conan version, operating system version and any tool that can be relevant.
  • [ ] I've explained the steps to reproduce the error or the motivation/use case of the question/suggestion.

/cc @uilianries @solvingj @Croydon @ericLemanissier @jgsogo

high CI medium queue feature

Most helpful comment

A build of the stable branch of qt is in progress at https://dev.azure.com/bincrafters/packages/_build/results?buildId=377
It already went further that most previous stable branch builds, which nearly always timed out during the "checking credentials" phase of the package upload.
testing branch has no problem any more : https://dev.azure.com/bincrafters/packages/_build/results?buildId=376
The problem is also fixed on visual studio 2019 image on appveyor: https://ci.appveyor.com/project/bincrafters/conan-qt/builds/25581651
I'd say you nailed it @SSE4. thanks a lot !

All 6 comments

I think I have found an easier way to fix it, based on https://github.com/kennethreitz/requests/issues/4506 - just use HTTPAdapter with max_retries
opening a PR

I've started a new CI job for OpenSSL/1.0.2s using your branch:

https://ci.appveyor.com/project/uilianries/conan-openssl/builds/25534770

@ericLemanissier if you can run some tests with Qt on Azure, it would be nice.
you may use this to install conan from branch:

pip.exe install conan_package_tools
pip.exe install https://github.com/SSE4/conan/archive/fix_connection_reset.zip

(NOTE: conan must be installed after CPT)

Thanks for the detailed investigation. Please report the results of your tests once you have them. Thank you!!

to test, it might be installed as:

pip.exe install git+https://github.com/SSE4/conan-package-tools.git@fix_connection_reset
pip.exe install git+https://github.com/SSE4/conan.git@fix_connection_reset

A build of the stable branch of qt is in progress at https://dev.azure.com/bincrafters/packages/_build/results?buildId=377
It already went further that most previous stable branch builds, which nearly always timed out during the "checking credentials" phase of the package upload.
testing branch has no problem any more : https://dev.azure.com/bincrafters/packages/_build/results?buildId=376
The problem is also fixed on visual studio 2019 image on appveyor: https://ci.appveyor.com/project/bincrafters/conan-qt/builds/25581651
I'd say you nailed it @SSE4. thanks a lot !

Was this page helpful?
0 / 5 - 0 ratings