Server: Avoid "CSRF failed" pages

Created on 13 Nov 2017  Â·  20Comments  Â·  Source: nextcloud/server

  • create a password protected share
  • open the link and get a password prompt
  • send your machine to sleep and open it on the next day
  • enter the password
  • press the "access link" button
  • expected: I get access
  • actual: "Access forbidden. CSRF check failed"

We should make the access forbidden page a bit less technical and make the error inline, so that the user just get an "Login failed - please try again" instead of the pure error page.

cc @nextcloud/security @nextcloud/designers @rullzer Does that make sense?

3. to review design enhancement help wanted papercut security

Most helpful comment

Apparently we've got API endpoints that return HTTP 200 - even for errors:

bildschirmfoto von 2018-02-27 17-02-37

Because why not. Who needs HTTP statuses anyway.

All 20 comments

Ah the CSRF joys.

So a few things:

  1. Yeah probably good to have a less technical error. Even if it means lying to the user about the reason ;)
  2. We discussed in Austria with @LukasReschke and @ChristophWurst, that maybe we could abuse the heartbeat mechanism to update the CSRF token?
  3. Or we would use some dedicated storage (just drop it also in the distributed memcache?) so it survives the default session timeout a bit better.

1 is not to hard to to. 2. Already a bit trickier. And 3 will need serious design and implementation effort (as you don't want to mess it up).

will need serious design and implementation effort

Whatever path we take, would be great to have a solid solution. The way we currently handle it is fragile. Let's see how other platforms/frameworks handle this.

As far as I understood the problem we need

  • A mechanism on the client that catches outdated CSRF tokens and updates them
  • A mechanism on the server-side to retrieve a new CSRF token

I could look into this for NC14.

I guess the same thing also happens for apps. For calendar and tasks I always get a

<d:error xmlns:d="DAV:" xmlns:s="http://sabredav.org/ns">
  <s:exception>Sabre\DAV\Exception\NotAuthenticated</s:exception>
  <s:message>CSRF check not passed.</s:message>
</d:error>

if the apps were not used for a while.

For best UX it would of course be best to not have an error message of any kind, but just have it work. ;) (So I guess the heartbeat update, or what Christoph mentioned?)

Not sure what happens after suspend though. Because when the heartbeat didn't run for a long time and your session expired there is nothing we can do.

If the session is still valid and the heart beat continues, that might fix it.

Not sure what happens after suspend though. Because when the heartbeat didn't run for a long time and your session expired there is nothing we can do.

There is: fetch a new now token and replace the currently used one.

The underlying issue for the page during logout is maybe fixed with https://github.com/nextcloud/server/pull/6544 in master

See also #1075

For best UX it would of course be best to not have an error message of any kind, but just have it work. ;) (So I guess the heartbeat update, or what Christoph mentioned?)

How about having a production mode which removes these errors from the front-end, redirecting the user to the login or Files app?
admins would still get notified via the logs.

A mechanism on the server-side to retrieve a new CSRF token

I've given this a quick test: https://github.com/nextcloud/server/blob/68fa0d4bb2b7f6cf101871d3b7a4faf70dad8308/core/Controller/CSRFTokenController.php#L48-L62

… and that seems to be usable on the client-side. As far as I can tell and have tested this (as a non-expert in security) other sites are unable to access this route because it's not a CORS route. @nextcloud/security is there anything I've forgot to add?

Of course, the client side is still missing. And this approach will only work for requests sent from JS and not for static HTML forms.

A controller like that is fine as far as I can tell.

What is the plan for the clientside? A heartbeat like request (can't we just hijack that btw?)?

What is the plan for the clientside? A heartbeat like request (can't we just hijack that btw?)?

My approach would be to register a global error handler for jquery ajax calls, try to detect an expired token and fetch a new one if necessary. That way we're complete independent of the heartbeat and just refresh the token if really necessary.

That sounds very elegant!

Couldn't we use a websocket? ping/pong _0x9/0xA_ are optionnal as the websocket keep the connexion open :)

I was writting what's above and changed my mind (leaving it for reference only), a heartbeat seems the easiest and is probably our best shot. It's very light.

My approach would be to register a global error handler for jquery ajax calls, try to detect an expired token and fetch a new one if necessary. That way we're complete independent of the heartbeat and just refresh the token if really necessary.

Almost there: https://github.com/nextcloud/server/compare/feature/csrf-token-controller#diff-f43b97e40f998e07bc6487d0901592c3

For some reason the original $.ajax error handler is triggered on the first failing request.

Apparently we've got API endpoints that return HTTP 200 - even for errors:

bildschirmfoto von 2018-02-27 17-02-37

Because why not. Who needs HTTP statuses anyway.

So this is a bit frustrating. First, I've not found a way to tell jQuery that a request shall be considered successful inside the error handler. I'm able to recover from the expired token error but the client code sending the requests still gets an error although the request is resent and successful. There seems no way to express that in the global ajaxError handler, unfortunately. Second, we obviously have to implement custom logic for our various API types because they are all far from what you'd expect from an HTTP API.

Therefore I thought about the heartbeat approach mentioned by @skjnldsv. On one hand this approach would update the CSRF token even before it expires. But on the other hand you end in race conditions for the case of either lost connectivity or putting the machine to standby: when the connection can be established again, the validity of the CSRF token depends on the timing of the sent requests. If the heartbeat is the first and finishes before any other requests are sent, it will work. But in any other case (and that's how browsers behave in that scenario AFAIK) requests will be sent concurrently, which means that those will all have outdated CSRF tokens and fail. Any code that sends such requests would have to implement a retry logic for those cases.

I'm open for any other ideas/approaches for this.

@ChristophWurst how about a hybrid approach.

I would say a hybrid approach is probably best here. Some kind of hearbeat that (I would not opt for websockets as that requires webserver config) updates the token live is fine by me and should cover the 90% case I guess. We can then try see what the best approach is for the remaining 10% later.

Also what endpoint did you query that a failed CSRF request resulted in a 200? Our middleware should catch that... and if not it is a bug.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

williambargent picture williambargent  Â·  3Comments

mama21mama picture mama21mama  Â·  3Comments

MorrisJobke picture MorrisJobke  Â·  3Comments

ThomasLeister picture ThomasLeister  Â·  3Comments

georgehrke picture georgehrke  Â·  3Comments