Dataverse: Enable geo-location for MDC when behind a load balancer

Created on 8 Jun 2020  路  10Comments  路  Source: IQSS/dataverse

@landreev recently discovered that when Dataverse is run behind load balancers (specifically on AWS), it picks up the IP address of the load balancer internally (in DataverseRequest.java) and that is eventually what's written in the MDC count log, which then breaks MDC's geo-location capability (based on using Geomind's IP database to assign the access/download to a give country/state).
Fortunately, it looks like AWS send an additional x-forwarded-for header by default that includes the user's IP address. There are other headers that sometimes can be used to send the IP address, i.e.
"X-Forwarded-For",
"Proxy-Client-IP",
"WL-Proxy-Client-IP",
"HTTP_X_FORWARDED_FOR",
"HTTP_X_FORWARDED",
"HTTP_X_CLUSTER_CLIENT_IP",
"HTTP_CLIENT_IP",
"HTTP_FORWARDED_FOR",
"HTTP_FORWARDED",
"HTTP_VIA",
"REMOTE_ADDR"

Since this may be different for different installations, we've discussed making a configurable option to select which header to check to get the IP address.
We've done a quick test at QDR picking up the x-forwarded-for header using AWS LBs and have verified that the mdc logs get the user ip. I'm working to submit a PR that would add configurability.

Note: By changing the logic in DataverseRequest.java, this will affect anyplace else the requestor's IP is used - seems like that shouldn't be the LB address in any case (i.e. it's not just an MDC issue).

Also note: AFAIK, the IP address is only used in MDC to get the geo-location - it's other functionality to filter robots, stop duplicates, etc. should not be affected/will work with logs prior to this PR even when behind LBs.

Most helpful comment

@poikilotherm
I for one completely missed #6456. Thank you.
But it sounds like it's being implemented pretty much the way you requested - except it's going to be possible to specify another header, different form X-Forwarded-For; since a different proxy may be using one of the headers @qqmyers listed above.

All 10 comments

Correct, this is not an MDC-specific issue. We've known that this issue existed behind the AWS ELB currently in our prod. It was not a problem for us, because we are not using IP groups (this may be the only purpose for which the IP address in the DataverseRequest may currently be used. but potentially there may be others in the future). We were basically waiting for somebody to complain about it before addressing it (or for us to have a need to start using IP groups again).

"addressing it again", strictly speaking, as there were some attempts in the past to fix this issue behind a different proxy some time ago.

There's a clear security concern: we absolutely don't want to have code that always checks this X-Forwarded-For header and automatically trusts the IP supplied in it! Because otherwise, with a Dataverse that is NOT behind an address-masking proxy, adding this header to the request becomes a ridiculously easy way to cheat and pretend to be coming from localhost, or an otherwise privileged subnet!
So we discussed this and agreed that in this PR it will be addressed by making this header check optional, and the header configurable. And adding some warning language to the guide emphasizing that the option should only be used if the admin is 100% positive that the header is automatically set by their proxy on all incoming requests.

Just adding this here to address it for anyone who may recognize this issue from way back.

Sorry to nitpick, but maybe change it from "... when behind a load balancer" to "... when behind a proxy that masks the incoming ip address".

This is (at least) related to #6456, if not being kind of a duplicate to that issue, adding another use case to it.

From the discussion there it never took off, as this seemed to cause security headaches as outlined above by @landreev. I would be really grateful to be included in any further, maybe non-SLOPI discussion of this matter. Thx!

@poikilotherm
I for one completely missed #6456. Thank you.
But it sounds like it's being implemented pretty much the way you requested - except it's going to be possible to specify another header, different form X-Forwarded-For; since a different proxy may be using one of the headers @qqmyers listed above.

@poikilotherm To clarify, I wasn't objecting to implementing it. Was just reminding everybody that we had once run into a problem with this, with somebody hard-coding that header lookup. As long as this is optional, and configurable, and there is an appropriate warning in the guide - it should be perfectly fine.

@poikilotherm - thanks for the pointer - I hadn't realized the connection at all. Looking through the comments, it seems like one reasonable addition would be to not accept localhost (127.0.0.1) from a header. Does that make sense? It wouldn't stop spoofing overall (hence @landreev 's warnings should still get added) but it would stop spoofing to get to admin functions. FWIW: I'm hoping to post code today.

@qqmyers I'm glad the security risks are on your mind. Here's where we removed the "X-Forwarded-for" header: df200fc (discussion in #2826). As you say, we absolutely don't want people elevating their privileges to be able to make themselves superusers, delete accounts, etc. by pretending to be localhost or 127.0.0.1.

@qqmyers Yes, absolutely! There's no good reason ever to accept "localhost" in that situation.
(I should've mentioned that it's not just IP groups, of course - spoofing localhost is far worse!).
The scary language in the warning should mention that the admin must be 100% positive that their proxy automatically sets, and overwrites, the header on all incoming requests... AND there is no way to access their server directly, bypassing the proxy either! The second part is equally important - because that would not always be the case.

@qqmyers
P.S. I can volunteer to write the "scary language" for the warning in the guide for the PR. Since I have kinda drafted it already, above.

@landreev - all yours - if you want to add a release note as well, go for it.

Was this page helpful?
0 / 5 - 0 ratings