Cryptography: Regression: Cannot use cyrptography in embedded wsgi

Created on 27 Aug 2015  Â·  45Comments  Â·  Source: pyca/cryptography

We have discovered at $work that the 1.0 cryptography release does not work with apache2+mod_wsgi. The error log spits out "Truncated or oversized response headers received from daemon process", which seems to happen if the wsgi process dies. We've tracked the issue down to an upgrade with cryptography, and something to do with threading and/or the Python GIL (a work around is in the wsgi config to set WSGIApplicationGroup to %{GLOBAL}).

I did a git bisect from 0.9.3 and 1.0, and found that the issue was introduced in commit b3d37a5d485bcd295d1933d638180e9cd5d23478.

Most helpful comment

I noticed a similar issue with to combination of apache, mod_wsgi and the Requests module.

When doing a request to a HTTPS site with Requests, the function call never returns.
The backend.activate_builtin_random() line of code mentioned in reply 4 also did the trick.

I eventually figured it out because I saw the following error in my apache error log:
extern "Python": function Cryptography_rand_bytes() called, but no code was attached to it yet with @ffi.def_extern(). Returning 0.

For the code that was causing the issue, see: jobec/django-auth-adfs@28936fb232f8b527fa9867f6263ee4439e292a1e

--- UPDATE 2016/11/29 ---
I ran into this bug again (v1.4 of cryptography).
It appears this backend.activate_builtin_random() MUST be called before doing any HTTPS request with the requests module. If it's not called before it, it will hang the wsgi process.
First with an error like: Timeout when reading response headers from daemon process and after a while (the moment the daemon process gets cleaned up) with function Cryptography_rand_bytes() called, but no code was attached to it yet with @ffi.def_extern()

All 45 comments

Wow, would not have expected it to come from that...

/cc @glyph

Me either, the commit itself doesn't look like it could cause something like this, I think that it may have been an earlier commit that introduced code paths that weren't called until the triggering commit.

Out of curiosity... Calling backend.activate_builtin_random() should disable this code path (you'll need to have a backend available in that scope of course. from cryptography.hazmat.backends.openssl.backend import backend will do it). Does this resolve your issue?

Yes, those two lines also resolve the issue.

OK, so here's a hypothesis:

This is a problem because sub-interpreters have an entirely different Python namespace (different modules, etc) but the same C namespace (same GIL, same C-level global variables, shared libraries, etc). I'm not sure _exactly_ why initializing the engine a second time causes a problem, but it's C-level runtime initialization stuff, so ¯_(ツ)_/¯. I'm sure that cryptography being surprised to discover an already-initialized OpenSSL causes all kinds of exciting issues.

@reaperhulk - you should like this one since it would also explain some of the other concurrency issues you're seeing at import time.

(The main hypothesis I guess I'm describing here is that subinterpreters are just a completely broken concept nobody should use, but maybe cryptography could be made to work around these issues.)

If we work on that assumption we can make engine addition idempotent at the C layer (see #2293) but since the random engine has callbacks in Python (that potentially point to functions not in the "current" sub-interpreter) it would still fail in the sub-interpreter...correct?

There's a pretty good explanation here - http://emptysqua.re/blog/python-c-extensions-and-mod-wsgi/#but-c-extensions-are-not-isolated - of how C extensions are treated by mod_wsgi.

I am pretty sure that you can call between different sub-interpreters just fine, since they share a GIL.

I flied a cffi bug to ask for a way to keep the GIL while calling C: https://bitbucket.org/cffi/cffi/issues/220/some-way-to-annotate-a-call-to-ask-it-not

One perhaps-obvious solution here would be to move all this library initialization to the module scope so that it's protected by the import lock, rather than trying to protect it with an ad-hoc lock. It seems counter to the way that cryptography is constructed, to avoid shared mutable state, but in this case there is just the _illusion_ of such avoidance: the mutable state does in fact exist and attempting to abstract it away is hard in a multi-interpreter environment.

Sub interpreters could definitely trigger a race in the engine setup code. #2293 proposes making this process idempotent which seems like a fairly reasonable solution to me.

@glyph I'm not opposed to that solution if it resolves our problem, but does it given the change in behavior in python 3.4?

Possibly not; I can't figure out at a glance how the per-module locks are initialized. @cyli suggests that cryptography could just take out a filesystem lock in /tmp ;-)

Okay, reviving this conversation. Proposal for the sake of discussion: To avoid issues with sub-interpreters we should do the following...

  • Move os random engine back to C
  • Make os random engine registration idempotent (we can hold a lock at the C layer which will prevent races with the sub-interpreters)
  • Move the locking callbacks to C (since they are presumably subject to the same issues)
  • Make locking callback registration idempotent (this part might be tricky if we want to detect locking callbacks already being registered by things like import _ssl)

Do people agree that this would resolve the issue? If not, why not? And if so, is there a way to do this without doing it all in C because obviously we'd prefer to not do that...

FWIW, I'm strictly opposed to moving this code back itno C.

On Tue, Sep 8, 2015 at 11:05 AM, Paul Kehrer [email protected]
wrote:

Okay, reviving this conversation. Proposal for the sake of discussion: To
avoid issues with sub-interpreters we should do the following:

  • Move os random engine back to C
  • Make os random engine registration idempotent (we can hold a lock at
    the C layer which will prevent races with the sub-interpreters)
  • Move the locking callbacks to C (since they are presumably subject
    to the same issues)
  • Make locking callback registration idempotent (this part might be
    tricky if we want to detect locking callbacks already being registered by
    things like import _ssl)

Do people agree that this would resolve the issue? If not, why not? And if
so, is there a way to do this without doing it all in C because obviously
we'd prefer to not do that...

—
Reply to this email directly or view it on GitHub
https://github.com/pyca/cryptography/issues/2299#issuecomment-138594505.

"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: 125F 5C67 DFE9 4084

FWIW, I'm strictly opposed to moving this code back itno C.

Just because having more code in C is bad, or...?

Pretty much :-)

C is bad, and going back to our C code is a regression in functionality and
security.

On Tue, Sep 8, 2015 at 3:41 PM, Glyph [email protected] wrote:

FWIW, I'm strictly opposed to moving this code back itno C.

Just because having more code in C is bad, or...?

—
Reply to this email directly or view it on GitHub
https://github.com/pyca/cryptography/issues/2299#issuecomment-138679419.

"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: 125F 5C67 DFE9 4084

Alex what regression in security do you see ?
Is that specific to random or does it include locking ?
Isn't handling low level locking in Python backwards ?

In order to properly initialize the random engine and locking callbacks, Cryptography needs to hold a lock so that these initializations are mutually exclusive between interpreters. I'm going to ignore the potential for concurrency among backends (import _ssl, M2Crypto, etc) because I am trying to hold on to a few shreds of sanity.

Here are some of the strategies I've discussed on IRC or in discussions with Cryptography maintainers:

Use the GIL

CFFI presently offers no capability of holding the GIL; we would really like at least a temporary solution before that is likely to land. Not to mention that every new-CFFI-version requirement means a new pypy requirement, practically speaking, which is pretty aggressive.

Plus, this only works on the C side of things. In Python, every bytecode executed might release the GIL, so it's no help there.

Allocate our own locks in Cryptography

This is what Cryptography is _trying_ to do now. The problem with acquiring such a lock is that there is nowhere to _share_ such a lock. And a lock you can't share isn't exactly a lock. Any lock allocated in Python code is going to be allocated in multiple interpreters.

If there were an API in C for allocating locks, then we could use that on the C side and then acquire that from the Python side. But there is no such API in C; POSIX locks and Windows locks are fairly different. We could of course use Python's API for this, but starting to rely on cpyext at this point in cryptography's history would seem like even more of a backslide than going back to C code.

Re-use the global import lock

As I mentioned above, Python has a global import lock. What I didn't mention above (although I did on another ticket) is that there is an API, imp.acquire_lock and imp.release_lock to acquire and release that lock.

The API is deprecated externally, but the very definitely _not_ deprecated importlib makes use of the same lock during its bootstrap initialization. So it should not be going away any time soon; it needs to be available in some form.

Conclusion

I think that the best way forward would be to use the import lock in the near term, and in the long term work with CFFI to have logic specifically dedicated to dealing with sub-interpreter concurrency by exposing some form of portable lock initialization, and then using that. Or maybe pursuade CPython that exposing the import lock is necessary for the long term. Or possibly even allowing Python code to run in initmodule somehow. For now, as a bonus, the import lock will probably protect loading C modules as well, since initmodule is called with the GIL held and I doubt many extension modules release it; I think this means we should be OK with _ssl for the time being, for example.

Isn't handling low level locking in Python backwards ?

Python provides APIs for portable locking; C doesn't. So it seems like a reasonable place to handle it, if we can manage it.

Using the import lock like that sounds like a reasonable safeguard to me, at least while it remains available in implementations we support.

It's not entirely obvious to me that the current approach to idempotency is in fact invalid but I guess that doesn't really matter if we have a real mutex we can use.

On a slightly different note, this discussion made me look at https://github.com/pyca/cryptography/blob/b5bb49d115e9f325d90bb96aebc15e6a1ef30f0d/src/cryptography/hazmat/bindings/openssl/binding.py#L158 and I suspect it might also be quite wrong in the presence of sub interpreters as each interpreter will get a totally isolated set of locks to use.

On a slightly different note, this discussion made me look at https://github.com/pyca/cryptography/blob/b5bb49d115e9f325d90bb96aebc15e6a1ef30f0d/src/cryptography/hazmat/bindings/openssl/binding.py#L158 and I suspect it might also be quite wrong in the presence of sub interpreters as each interpreter will get a totally isolated set of locks to use.

My reading of that code is that one subinterpreter will "win" and initialize the locks with its lock object, and then all other subinterpreters will get a reference to that via the pointers in openssl itself. So I think it's still correct, inasmuch as any code can be "correct" in the face of this mess…

@glyph I think you are right about how the callbacks works means that whichever one wins the race will be the one that the callbacks are executed in. Unfortunately I think there's still a problem since sub interpreters have independent lifetimes. If the one that won the race exits the other interpreter will no longer have a valid locking callback to use.

This same problem applies to osrandom too. It's not really very obvious to me that merely getting initial allocation of the callbacks safe solves the sub interpreter problem.

@public I don't think that tearing down a subinterpreter will automatically destroy all objects within that subinterpreter. It might do something like clearing module dictionaries, but it relies upon GC to free objects. I suppose someone could write a proof of concept C program :weary:

As @glyph notes, we actually still need the global import lock to protect the management of the per-module locks, so using that as at least a temporary workaround is reasonable.

For Python 3.5+, cffi would ideally support using multi-phase initialization which has been designed to improve extension module subinterpreter support: https://docs.python.org/3.5/c-api/module.html#multi-phase-initialization

That unfortunately won't be immediately helpful in cryptography's case, since the problem you have is one that is still an open question for multi-phase init: making it easy to handle external singleton libraries like OpenSSL in a subinterpreter compatible way (or at least in a way that fails noisily rather than introducing subtle data sharing problems - Cython opts for the latter approach by explicitly not supporting subinterpreters at all).

Further work on this is planned for 3.6, so if anyone from cryptography and/or cffi has time to participate, we'd love to hear from you on import-sig: https://mail.python.org/mailman/listinfo/import-sig

The "global" import lock is actually a C global:

https://hg.python.org/cpython/file/tip/Python/import.c#l157

...but that may change. While the motivation for the lock is thread races in the same interpreter, C extensions benefit (perhaps incidentally) in the case of subinterpreters. I'm glad this discussion came up because I expect at some point de-globalizing the lock _between_ interpreters may happen as we try to further isolate interpreters in the same process. It's clear that the C extension case will have to be satisfied safely. That may not be the 3.6 timeframe, but I could see it happening by 3.7 (as we make headway on isolation efforts).

Also...

...the C extension support in subinterpreters to which Nick refers should help. Currently C extension modules can define C-level state (instead of using C globals); see https://www.python.org/dev/peps/pep-3121/. Perhaps it would help to support some form of process-global per-module state.

@Naddiseo Would it be possible to test the change in #2336 in your environment?

@reaperhulk, I'll check on Monday when I get to work.

@reaperhulk, it appears #2336 does not fix my issue. I'm also having trouble reducing the project to a minimal test case: I can reproduce the bug on the full project, but not on a single file which imports cryptography.

@Naddiseo d'oh :( Is the project something where you could build a docker container we could use to test potential solutions or is it something proprietary?

Subinterpreters are very frustrating...

It is proprietary, so anything other than a minimal test case is a no-go. I'll continue to try and make a test case in my spare cycles. In the mean time, if you do have other changes you'd like to test, you can point me to a branch/commit and I'll give it a test.

@Naddiseo if you get a chance could you test #2293? I made some changes to that approach.

@Naddiseo if you get a chance could you test #2293? I made some changes to that approach.

@reaperhulk I have a similiar problem (running on py27, cryptography 1.0 in a apache2 wsgi env and getting RuntimeError: osrandom engine already registered) and the patch you proposed in https://github.com/pyca/cryptography/pull/2293 fixes this problem.

@toabctl that's great -- any chance you can provide a docker container that replicates the bug? We haven't been willing to land #2293 thus far because it relies on locking callbacks that are registered in Python. I think #2293 is really just narrowing the window for the race condition and not fixing the underlying issue, but I'd like to experiment with an environment that triggers the initial bug.

any chance you can provide a docker container that replicates the bug?

No. Sorry for that. I can paste the wsgi file and the apache config if that helps.

@reaperhulk, no joy with #2293

I noticed a similar issue with to combination of apache, mod_wsgi and the Requests module.

When doing a request to a HTTPS site with Requests, the function call never returns.
The backend.activate_builtin_random() line of code mentioned in reply 4 also did the trick.

I eventually figured it out because I saw the following error in my apache error log:
extern "Python": function Cryptography_rand_bytes() called, but no code was attached to it yet with @ffi.def_extern(). Returning 0.

For the code that was causing the issue, see: jobec/django-auth-adfs@28936fb232f8b527fa9867f6263ee4439e292a1e

--- UPDATE 2016/11/29 ---
I ran into this bug again (v1.4 of cryptography).
It appears this backend.activate_builtin_random() MUST be called before doing any HTTPS request with the requests module. If it's not called before it, it will hang the wsgi process.
First with an error like: Timeout when reading response headers from daemon process and after a while (the moment the daemon process gets cleaned up) with function Cryptography_rand_bytes() called, but no code was attached to it yet with @ffi.def_extern()

I'm going to close this as our current solution for mod_wsgi is the WSGIApplicationGroup %{GLOBAL} fix and we've recently documented it in our FAQ.

Hopefully at some point in the future this won't be required, but for now it's the best we can do.

Should the FAQ also mention https://github.com/pyca/cryptography/pull/2537 since that is the (eventual) path forward on this issue? Or am I confused?

Are there any downsides in using backend.activate_builtin_random() ? Or is it also a safe workaround?

@glyph until we merge it I'm going to be somewhat quiet about it (other than to perhaps point people that direction).

@jobec The random initialization issue appears to be mostly handled by some code we put in the 1.2 release, so it shouldn't be necessary to separately call activate_builtin_random (although doing so shouldn't cause problems). We've been receiving sporadic reports of other issues though, so that's why we've stated that the official way to handle this is the ApplicationGroup directive.

Adding a note here as I ran into this problem and the documented workaround - setting WSGIProgressGroup %{global} - didn't work for me. I hope someone else will find this useful.

From what I can glean, setting %{GLOBAL} doesn't assign the reference (e.g. WSGISCriptAlias) to the first sub-interpreter, but simply to the first process group. Sub-interpreters are created purely by the number of different wsgi applications that are in that process group, with "different application" is roughly equivalent to a different wsgi.py file. So, assuming that diagnosis true, the documented solution doesn't work at all and is in fact completely wrong.

I added some code to the wsgi.py file to syslog out id(os) and os.getpid(), as a way of detecting additional sub-interpreters being generated. Using id(builtins) didn't seem to be useful.

The fix was to ensure each application runs in a different process group, I.e., a different WSGIDaemonProcess for every different wsgi.py file you have. When I did this, this problem, along with the 'isinstance' problem the WSGI documentation referred to, completely went away.

(As an aside, I only have a single wsgi.py file, but I'm referring to them from two different VirtualServers. I think that whatever code detects references to the same wsgi.py file breaks across the virtualserver boundary. I have multiple references to the wsgi.py file inside one of the VirtualServers, and that doesn't appear to cause additional sub-interpreters.)

I hope this helps!

I ran into this bug again (v1.4 of cryptography).
It appears backend.activate_builtin_random() MUST be called before doing any HTTPS request with the requests module. If it's not called before it, it will hang the wsgi process.

First with an error like: Timeout when reading response headers from daemon process and after a while (the moment the daemon process gets cleaned up) with function Cryptography_rand_bytes() called, but no code was attached to it yet with @ffi.def_extern()

This issue will be resolved with the next cryptography release. See #3229

Was this page helpful?
0 / 5 - 0 ratings

Related issues

clarius-deploy picture clarius-deploy  Â·  22Comments

reaperhulk picture reaperhulk  Â·  42Comments

cromefire picture cromefire  Â·  23Comments

abhishek-ram picture abhishek-ram  Â·  37Comments

freedge picture freedge  Â·  22Comments