https://sentry.io/read-the-docs/readthedocs-org/issues/533022676/
timeout: timed out
(10 additional frame(s) were not displayed)
...
File "readthedocs/doc_builder/environments.py", line 472, in run
return super(BuildEnvironment, self).run(*cmd, **kwargs)
File "readthedocs/doc_builder/environments.py", line 307, in run
return self.run_command_class(cls=self.command_class, cmd=cmd, **kwargs)
File "readthedocs/doc_builder/environments.py", line 478, in run_command_class
return super(BuildEnvironment, self).run_command_class(*cmd, **kwargs)
File "readthedocs/doc_builder/environments.py", line 346, in run_command_class
build_cmd.run()
File "readthedocs/doc_builder/environments.py", line 234, in run
output = client.exec_start(exec_id=exec_cmd['Id'], stream=False)
(Build) [astropy:latest] timed out
Oops yeah, that's a new problem with docker from a deploy last week. I'll close the issue here as we're addressed the docker change in #3999
Note: I see that you closed the other issues, but the builds are still failing.
@SylvainCorlay yes, they are, the problem isn't solved yet, the team is working on it :)
gotcha, thanks!
I'm not sure why this is happening now.
We did a deploy with a newer version of docker python package (3.2.1) but there is nothing in the changelog that talks about timeouts: http://docker-py.readthedocs.io/en/stable/change-log.html and https://github.com/docker/docker-py/milestone/50?closed=1
I'm buiding astropy in my local instance for more than 10 minutes and it continues building (it's still creating the conda env). No timeout reached.
There is nothing new related to timeout and the only thing that I've found is the timeout for the API calls in the constructor for the APIClient (http://docker-py.readthedocs.io/en/stable/api.html?highlight=APIClient#docker.api.client.APIClient).
Although, we are not setting it and the default is 60 seconds. So, if it's considered to the exec_start, any build that takes more that 1 minute should fail.
I'm still a little confused. Will keep researching. Also, I was able to run an astropy build for 1031 seconds (it finally failed because the latest branch has a problem --it seems).
Also, the timeout is from socket.recv: https://github.com/docker/docker-py/blob/master/docker/utils/socket.py#L30
We noticed that this problem is not present on docker==3.1.3, so we are going to downgrade this package probably. Also, this only happened on the servers --I wasn't able to reproduce this locally even building big projects.
Besides, I noticed that most/all of the errors reported in Sentry are only for projects that uses conda and in the conda env create step.
There is another Sentry logs with project that fails at pip install: https://sentry.io/read-the-docs/readthedocs-org/issues/533186433/events/latest/
Problem seems to be fixed now for me, thanks!
see https://readthedocs.org/projects/easybuild/builds/7094389/
Problem seems to be fixed now for me, thanks!
cc @gouarin
It also works for me now.
Thanks !
Thanks for your feedback.
I downgraded docker python package to 3.1.3 as a current solution. Although, that's not the final solution since at the moment we don't know why this happened originally with 3.2.1 and I wasn't able reproduce this locally either.
So, at the moment, we are going to be blocked on 3.1.3 while we can research what's going on with docker :(
A new docker version was released today: https://github.com/docker/docker-py/releases/tag/3.3.0
It says it fixes an issue with the timeout for stop and restart. It _may_ be related with our case...
@humitos the docker client was updated in https://github.com/rtfd/readthedocs.org/pull/4124, we don't have this problem anymore, right?
@stsewd we don't know yet. That PR wasn't deployed yet.
Just deployed and the issue is still present in 3.3.0 :(
I downgraded it to 3.1.3 again.
Not too much we can do here for now. Unassigning this. We will need to try in production with a newer version in the future :/
docker 3.7.0 is released, we could upgrade our version and test the new one manually on one of the builder first before merging and deploying.
I just manually upgraded docker in our build03 to version 3.7.2 and triggered a couple of builds: they passed. Also, Sentry does not report any problem on build03 at the moment.
I think we can test this for some days more and then upgrade our requirements file to make this change in all of our builders.