One finding from the Beethoven sprint ( @tisto ) is a problem with ZODB 5.4.0 when running tests of Plone 5.2. I spend a day or more tracking this down, while discussing it with @lukasgraf and @jaroel - the hint to narrowing it down more closely was the different behaviour (frequency of occurrence) when running my laptop on battery versus on AC! :D
It happens during tests that are longer running, and it can be made to happen more frequently by introducing time.sleep(2) or similar in test cases. (Functional tests only I think, using the z2.ZSERVER_FIXTURE, but not sure about this).
In plone.restapi we have a specific test, (the canary in the coalmine), that makes the problem happen on perhaps 2 out of 3 testruns. But as you can see on Jenkins (see the examples below), it occurs on plone 5.2 also without restapi, but less frequently.
The bug is reported here: https://github.com/zopefoundation/ZODB/issues/208
It can be seen once in a while in the jenkins builds, for example here:
https://jenkins.plone.org/job/plone-5.2-python-2.7/1230/testReport/junit/plone.app/testing/layers_rst/
or here:
https://jenkins.plone.org/job/plone-5.2-python-2.7/1235/testReport/junit/plone.app/testing/layers_rst/
When I downgrade to ZODB 5.3.0 on the coredev 5.2 branch the problem "goes away". My hypothesis is that the problem is still there, but is masked by 5.3.0
More info on the ZODB ticket.
I run the tests on coredev 5.2 branch.
For easy reproduction run the plip-2177-plone-restapi.cfg profile - or play with time.sleep(x)
I will debug the problem further as described in https://github.com/zopefoundation/ZODB/issues/208
@jensens @pbauer Maybe you wondered about the failing jenkins tests that sometimes happen?
Input appreciated.
Wow, good catch!
It can be reproduced outside of a testing environment, it is related to zope 4 shutdown. More on this here https://github.com/zopefoundation/ZODB/issues/208
Latest Plone 5.2 build failed (again) because of this problem.
One possible temporary solution is to downgrade to ZODB 5.3.0 (but it might not work with the py3 builds ? )
We really need to either fix this or go back to 5.3.0 until it is fixed. It would kill our momentum if that many jenkins jobs fail during the sprint in Halle.
@pbauer
Yes, exactly my point in https://community.plone.org/t/zope-4-welcome-sprint-in-halle-germany-may-16th-to-18th-2018/6351/11
Actually a small patch to 5.4.0 would do the job. I can make one if you agree. But how do we apply it?
Should only be applied when testing to not sneak this in permanently.
The bug has been there 'always', but was surfaced after some cleanup code in ZODB.
We could simply catch and ignore the keyerrors for now. It is just when closing the connections, shutting down.
A race between:
MainThread: DB.Close -> get connections in pool and begin the iteration of all the DB connections
Dummy-1: connection.close()
It will be random who closes the connection first.
A real fix would be to make the ZODB Connection thread safe.
See the comment by jmuchemb here:
https://github.com/zopefoundation/ZODB/issues/208
I'd favor pinning the old version, everyting else seems to be twiche as much work. It's fine if you want to create a PR with the proposed fix but my guess it might take a while until a new version is released and we already have to many source-checkouts.
Fine :)
@pbauer this is on the sprint agenda:
"Migrating ZODBs with Plone from Python 2 to Python 3"
This might be related to:
ZODB now uses pickle protocol 3 for both Python 2 and Python 3.
(Previously, protocol 2 was used for Python 2.)
The zodbpickle package provides a zodbpickle.binary string type that should be used in Python 2 to cause binary strings to be saved in a pickle binary format, so they can be loaded correctly in Python 3. Pickle protocol 3 is needed for this to work correctly.
Object identifiers in persistent references are saved as zodbpickle.binary strings in Python 2, so that they are loaded correctly in Python 3.
(https://github.com/zopefoundation/ZODB/blob/master/CHANGES.rst)
But we can easily make a patch before the sprint - I'll get back to that.
With the release of ZODB 5.5.1 and transaction 2.4.0 this is done. It will be included in Plone 5.2a1
Most helpful comment
@pbauer this is on the sprint agenda:
"Migrating ZODBs with Plone from Python 2 to Python 3"
This might be related to:
But we can easily make a patch before the sprint - I'll get back to that.