Freshrss: Per-feed option to strip query string (arguments) from article links

Created on 8 Apr 2018 · 31Comments · Source: FreshRSS/FreshRSS

Another privacy related issue.

Some annoying RSS feeds add a nasty "user ID" thingy to article links in the URLs the feed provides which they try to use to "deanonymize" users of the RSS feed.

There are dozens of scenarios I could think of how you could make use of this identification:

draw a relation between FreshRSS server and its users
any kind of user tracking ("what does the user read", etc.)
...

While investigating this privacy issue, I found out that often it suffices to simply strip the entire query string from the URL, then the page access "looks" like a normal visit, not an RSS reader. It would be helpful if the feed settings provided a method to strip the query string.

I could even imagine a whitelist and/or blacklist mechanism to allow advanced users to specify the query string arguments that need to remain or can safely be removed. But that's probably overkill. For my personal feeds, the entire query string can be stripped.

Extension I18n

Source

TheAssassin

Most helpful comment

Currently, FreshRSS randomizes the users before update. Within a single user, the feed refresh order is based on the least recently updated, but not all feeds are fetched every time due to different per-feed refresh rates, and the fact that some feeds may already have been fetched for previous users. So it is already the case that feed order is not static.

Alkarex on 24 Apr 2018

👍2

All 31 comments

That's interesting. Could you provide an example of such feeds ?

aledeg on 8 Apr 2018

I don't have that specific "tracking user ID" example at hand at the moment, but the popular German IT blog heise.de has a similar anti feature:

https://www.heise.de/newsticker/meldung/WTF-US-Cyber-Buergerwehr-droht-Russland-und-Iran-mit-ASCII-Flaggen-4012911.html?wt_mc=rss.ho.beitrag.atom

TheAssassin on 8 Apr 2018

This Web site gives me https://www.heise.de/newsticker/heise-atom.xml as RSS feed, in which I cannot immediately spot the problem

Alkarex on 8 Apr 2018

@Alkarex the part ?wt_mc=rss.ho.beitrag.atom is which tells the website that you're coming from an RSS reader, which isn't really necessary for the functionality, is it?

TheAssassin on 8 Apr 2018

Ah indeed, you are right @TheAssassin

Alkarex on 8 Apr 2018

I re-checked a couple of smaller regional newspaper feeds in Germany that I found in some chat log about this topic, and found one which uses a similar pattern:

http://www.nordbayern.de/cmlink/15.423?cid=2.244

They append ?rssPage=<base64-encoded blob>. I never checked what the blob really means before, and it appears like they base64-encode the "department" which was responsible for an article. Pretty silly way of doing it, but well... They don't even send the right MIME type for the RSS page...

In both cases, the query string arguments are not at all necessary for the functionality, they can safely be removed. When using a browser that doesn't send an annoying Referer, there is no way for them to know whether it's an RSS feed reader who's sent you to their page or not.

I can't find that "unique ID" example again, but I am pretty certain I've seen it before. Even if there is no such feed yet, I could well imagine that some feeds would secretly add a UID to the URLs it sends in the feed.

@Alkarex why did you flag it as "Extension" issue? Shouldn't be too hard to implement in the core.

TheAssassin on 8 Apr 2018

Yes, it is not difficult to implement, it is more a matter of finding a good balance between core and extensions. Right now, it could already be addressed with an extension.

But I am considering adding a number of per feed options (after https://github.com/FreshRSS/FreshRSS/pull/1838 ) such as regex replace, which could help there, without extension.

Alkarex on 8 Apr 2018

Regex replace might be a bit too complex for the average user, though. A non-dev normally doesn't know what regular expressions are, how they work, or how to write them. Some more "simple" string manipulation tools like suggested in this issue would help those users.

TheAssassin on 8 Apr 2018

There are no examples here of unique user ID tracking, only source tracking.

da2x on 24 Apr 2018

@da2x right. What's your point?

TheAssassin on 24 Apr 2018

My point is that this really is a non-issue.

It’s a good thing for the whole RSS ecosystem to allow some minimal tracking like this as it helps publishers see that it’s worth their time, money, and investment to maintain RSS support. An individuals privacy isn’t violated by someone being able to identify that you clicked through to a website from an RSS feed. This information is only valuable in huge aggregate numbers.

It would be a privacy issue if they were somehow assigning people unique identifiers and tracking them across the web. However, I don’t really see any privacy issues whatsoever in tracking that “person A clicked through to our site from an RSS reader” issues with tracking only track “person A came from an RSS”. I’ve even made a tool to help WordPress site owners track clicks from RSS. I’ve built standard privacy controls into this tool so if you send a standard DoNotTrack opt-out HTTP header (DNT: 1) the tool removes tracking by default.

The suggested solution of introducing an option to drop queries just adds complexity for very little gain. The most common tracking technique for feeds is to track clicks through a redirect (e.g feedproxy.feedburner.com, go.example.com/rss/article-id). You can’t really get around those redirects without following the redirect to see where it leads so you can’t really get away from tracking. A determined publisher/tracker could even map your IP as you retrieve the RSS feed and automatically assume that anyone without an HTTP referral header from that IP for the next hour came from a feed reader. …

However, a handful of feed readers (including the popular Liferea client for Linux) supports sending the DoNotTrack header. It’s only supported by a handful of websites, but at least that’s something and it doesn’t require a lot of effort to implement and maintain unlike a complicated URL rewriting tool.

da2x on 24 Apr 2018

👍2

Now that the cat's out of the bag I might as well add that I also gladly send my RSS origin to Der Spiegel and others to "help publishers see that it’s worth their time, money, and investment to maintain RSS support." It is, however, somewhat annoying to have the added cruft when you want to share the link with someone else.

Frenzie on 24 Apr 2018

@da2x I disagree.

As @Frenzie pointed out, this form of "tracking" is broken by design. The services try to track users across different systems, which can only be done by using a form of parameter that is made sure to be sent by the client's browser without having to control the page the link is embedded in (required for e.g., POST parameters). This kind of bogus way of determining the user count of an RSS feed does not work reliable to make a precise statement, maybe it allows you to tell whether it's used at all.
And, as @Frenzie pointed out, sharing links with other users with these parameters is not only annoying but _will_ give the service a wrong impression. Imagine some celebrity regularly shares links with such a parameter, the stats will show peaks for these articles. If I had time, I could give you many more examples when the tracking will become especially unreliable.

The argument that there is much complexity needed to cut off query string parameters, well, even Python has a URL parser where you can alter the query string by changing a dict, and I bet a language like PHP must have something similar, at least in the common frameworks. As said before, I would be fine with a switch that removes them completely.

By the way, Do-Not-Track is a "well-intentioned" measure against tracking, but isn't respected by most tracking systems. For example, if I send a Do-Not-Track header to the RSS feeds mentioned in here, the query string parameters are still in place. As it's a form of tracking (even though it's questionably broken), Do-Not-Track is IMO violated. Just because _your_ specific tool respects it (and that's really a good thing, don't get me wrong), it doesn't mean every other system does the same, unfortunately.

assume that anyone without an HTTP referral header from that IP for the next hour came from a feed reader.

Well, if you try hard enough, you can read the weirdest information out of seemingly unrelated data... I personally use browser plugins (or browsers) that do prevent sending a Referer (with a whitelist for services like FreshRSS which mis-use them as a bogus security feature... right, I wanted to create an issue for that... Referer is not meant for security purposes). Mozilla & Co. nowadays discuss turning off the Referer without the need for extensions. So, your assumption that everybody without a Referer is using a feed reader is as wrong as assuming that everybody having a URL with a query string parameter is using one.

You can’t really get around those redirects without following the redirect to see where it leads so you can’t really get away from tracking. A determined publisher/tracker could even map your IP as you retrieve the RSS feed and automatically assume that anyone without an HTTP referral header from that IP for the next hour came from a feed reader. …

That's fairly simple, actually. If you don't want client-side tracking, you should have your feed reader follow the redirections while fetching the feed, and presenting them to the user instead of the original links. As the feed provider already knows that your reader is reading the feed, offloading this to the feedreader doesn't leak any additional information, but should prevent the client from being tracked.

To sum up, every service who really tries to determine feed reader usage by using query string parameters should re-think this concept. Adding such an option isn't adding much more complexity than any other feature proposal. Just because you develop a plugin that depends on this information it doesn't mean you should try to lobby against a privacy-enhancing feature. Especially when it's an optional "expert" feature that isn't enabled by default. That's pretty cheap, IMO.
And just because we haven't found any "per-user" tracking _yet_ doesn't mean there won't be in the future, given the madness of the ideas of the tracking companies. Being prepared seems like a good idea, at least to me.

TheAssassin on 24 Apr 2018

I wanted to create an issue for that... Referer is not meant for security purposes

This here?

https://github.com/FreshRSS/FreshRSS/blob/f0fd273199682881b805e968ca36df4ccdbfa7a1/app/FreshRSS.php#L58-L73

That does seem a bit pointless. Spoofing headers is incredibly simple. The code was introduced in relation to #634, so pinging @marienfressinaud in case he remembers the rationale.

Mozilla & Co. nowadays discuss turning off the Referer without the need for extensions.

It boggles the mind that turning off stuff like scripts and referer should require extensions in the "Mozilla & Co." world. :-)

Frenzie on 24 Apr 2018

I think that's @Alkarex that added this, I can't remember the reason :/ (btw, I didn't take time to read all the arguments, but I'm pretty impressed by the discussion :+1:)

marienfressinaud on 24 Apr 2018

@Frenzie exactly, thanks for bringing this up (although a separate issue is probably better, but I'll better leave that to you).

OWASP is publishing contradictory documents regarding the Referer header. It _can_ be used to prevent a majority of XSRF attacks (see the CSRF cheat sheet) with unmodified browers. It doesn't always work as intended (especially on share hosters, or when you share a domain with other web software), but it can serve as a first line of defense against CSRF attacks.

However, the Referer _always_ leaks information, especially to third parties, when linking to external pages. This becomes a problem when sensitive information is contained in the URL (especially the query string). See this page for example. As you can't predict what kind of information may be useful for further attacks, it is advisable to disable sending the Referer in the browser. However, some pages still think they need to rely on the Referer, therefore all plugins for the purpose (e.g., Referer Control) provide a way of whitelisting origins for which the Referer is sent. This is how I got FreshRSS and a couple of other web services I host myself to work with my privacy enhanced browser.

I always recommend to implement better suited and more secure alternatives to prevent XSRF attacks. An easy way is to implement CSRF tokens. This method doesn't depend on browser features other than the ability to use hidden form inputs.

Spoofing headers is incredibly simple.

May be true, but for the browser, it's wrong. The Referer is a functionality built into the browser that can't be altered by using JavaScript code. See this guy telling about his experience with the Referer and JavaScript.

TheAssassin on 24 Apr 2018

@Frenzie Changing the Referer of someone else is not easy. This is another layer to avoid XSRF and similar attacks, and is not related to what you can do with your own browser / client.
https://www.owasp.org/index.php/Cross-Site_Request_Forgery_(CSRF)_Prevention_Cheat_Sheet#Checking_the_Referer_Header

Alkarex on 24 Apr 2018

@TheAssassin FreshRSS uses a few attributes to instruct the browser NOT to send Referer information to other sites. Do not confuse the advantages/drawbacks of Referer for the same domain vs. third-party domains.

Alkarex on 24 Apr 2018

@TheAssassin

May be true, but for the browser, it's wrong. The Referer is a functionality built into the browser that can't be altered by using JavaScript code. See this guy telling about his experience with the Referer and JavaScript.

I wasn't talking about JS. As one of the comments in your link puts it, "but no one can stop a CURL request ;)"

@Alkarex
Alright, thanks.

Frenzie on 24 Apr 2018

For reference, FreshRSS uses both an explicit rel="noreferrer" on each external link, and a global <meta name="referrer" content="never" /> in the HTML headers (which might have to be updated to another keyword(s) when browsers get stable),.
https://github.com/FreshRSS/FreshRSS/issues/955
https://github.com/FreshRSS/FreshRSS/commit/8a776f146182bc6870702cfeb87041e3af66b24b
But those things are not really related to this thread :-)

Alkarex on 24 Apr 2018

@Alkarex I'm not saying that FreshRSS is leaking information or doesn't try to prevent leaking.

The problem is that the Referer is _always_ leaking at _least_ your previous location, allowing e.g. cross-site tracking, and in the worst case leaking information allowing attacks. In order to prevent these issues preventively, the browser should be configured not to send a Referer header at all.

This is a known issue for years, and slowly but steadily browser vendors started to discuss whether the Referer should be removed entirely.

In my time at the university, I chose an elective about web app security, and we've discussed the whole CSRF attack issue there. We've come to the consent that the Referer is not suitable to prevent CSRF attacks. The alternatives, mainly the CSRF token approach, are by far more suitable, as they don't have any side effects, and don't depend on functionality that is discussed to be removed anyway.

The information in the OWASP wiki is often outdated, by the way. I've met some OWASP contributors, which confirmed this. It's a wiki after all, everyone can edit it, but wiki'd information isn't ensured to be up to date at all times.

Please note that the Referer header is _optional_, and web applications depending on a Referer header are broken by design, at least to some extent. Please see RFC 2616, section 14.36.

I even noticed that the _entire_ leak-of-information topic is even discussed in the Security Considerations (section 15) in the standard. Most relevant for this issue are 15.1.2 Transfer of Sensitive Information and 15.1.3 Encoding Sensitive Information in URI's [sic], which both discuss the Referer header's drawbacks. If you don't believe _me_, then please trust the standard.

@Alkarex I can open a separate issue if you want to.

TheAssassin on 24 Apr 2018

I can open a separate issue if you want to

Only if you believe there is actually an issue. As far as I can see, everything is at it should be, at least for the time being. FreshRSS works fine if you do not send a Referer at all, which is an acceptable strategy. What I do not think is good is to send random values in the Referer. Even without extensions, browsers such as Firefox have had for some time already some finer options for Referer, e.g. not to send Referer cross domain, or trimmed version of URLs https://wiki.mozilla.org/Security/Referrer

https://github.com/FreshRSS/FreshRSS/blob/dfc638dd9856e5507e482583c4e7339fcd2bb915/lib/lib_rss.php#L388-L391

Alkarex on 24 Apr 2018

Incidentally, RFC 2616 says it's obsoleted by (among others) RFC 7231: https://tools.ietf.org/html/rfc7231#section-5.5.2

That also includes a discussion on some security considerations:

Some intermediaries have been known to indiscriminately remove
Referer header fields from outgoing requests. This has the
unfortunate side effect of interfering with protection against CSRF
attacks, which can be far more harmful to their users.
Intermediaries and user agent extensions that wish to limit
information disclosure in Referer ought to restrict their changes to
specific edits, such as replacing internal domain names with
pseudonyms or truncating the query and/or path components. An
intermediary SHOULD NOT modify or delete the Referer header field
when the field value shares the same scheme and host as the request
target.

Frenzie on 24 Apr 2018

SHOULD NOT is not MUST NOT, though. And the header is still optional, _technically_.

By the way, at the moment, CSRF protection isn't broken when removing the Referer header, as FreshRSS will stop functioning as intended.

And unfortunately, on share hosters the entire CSRF protection approach is broken, when sharing a domain with an attacker. Also something that can't affect a CSRF token's security.

FreshRSS works fine if you do not send a Referer at all, which is an acceptable strategy.

@Alkarex I will retry later by removing the whitelist entry in my plugin.

TheAssassin on 24 Apr 2018

This discussion just made me realize there is a pretty reliable way to fingerprint RSS client users across networks that don’t involve cookies, URL parameters, or anything of the sort. I’ll do some testing. Details to follow.

da2x on 24 Apr 2018

👍2

I'll risk a guess: you encode some user ID in the URL "path"? Anyway, details would be appreciated, in order to develop counter measures.

TheAssassin on 24 Apr 2018

Wouldn't that be "of the sort"? :-)

Frenzie on 24 Apr 2018

Sure. It's the exact same method, technically. The URL is the only data that is forwarded by the feed reader to the user, therefore the tracking information must be encoded in it. It's just making removal of the information harder. Call it a further obfuscation of the tracking information.

https://github.com/FreshRSS/FreshRSS/pull/1838 might introduce suitable counter measures.

TheAssassin on 24 Apr 2018

This only work for Google (blogspot), WordPress.com, Medium, Cloudflare and large infrastructure companies. Luckily the web isn’t all that centralized and these companies aren’t interested in our data — oooh, wait….

Feed reader fingerprinting: All feed readers check their subscription list from first to last. It’s an ordered list. Either in the ordered by the time of subscription or in a user-set order. The list of feeds may be unique but the order of the list most probably is unique. All you need to do as a large cloud provider is to look at the update order of the feeds coming from the same IP (possibly also User-Agent) within a short time interval. If you see the same list of feed updates updated from another IP (in the same geographic area) then you’ve just linked two IP addresses to the same user. You don’t need to be able to track all the user’s feeds; just a handful for this to be possible. The accuracy of this method can be greatly improved by involving cache-revalidation headers (evercookies).

Counter-measure: Randomize the scheduled list of feeds on updates. Round If-Modified-Since (Last-Modified) up to nearest 15 minute time block and drop If-None-Matches (ETags).

da2x on 24 Apr 2018

@da2x why isn't it enough to check how many IPs are recurringly downloading the feed?

@Frenzie randomizing the list of checks might be a good idea. I already think about using Tor...

TheAssassin on 24 Apr 2018

Alkarex on 24 Apr 2018

👍2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

api Bad Request!

Tealk · 5Comments

Release candidate FreshRSS 1.16.1

Alkarex · 5Comments

How do I tell FreshRSS to sort articles by publication date, not date added?

deanishe · 4Comments

Add links to the documentation from the different readme sections

Alkarex · 6Comments

new feature hotkey for play / pause embedded media in item

mdemoss · 4Comments