News: NextCloud-News generates unreasonable amounts of traffic.

Created on 10 Dec 2019  ·  6Comments  ·  Source: nextcloud/news

Dear NextCloud News developers,

I'm one of the contributors to solar.lowtechmagazine.com a website which runs on minimal server using off-grid solar power. The website is a way to talk about the energy use of ICT infrastructure.

We've been running for one year now and are compiling the statistics of that year, seeing which lessons we can draw. Looking at our visitor statistics we noticed feed readers represent 95% of our data traffic while representing only 25% of our visitors. Zooming in on this we noticed that NextCloud News in particular represents about 60% of that data traffic. In concrete numbers: In one year 7304 unique visitors using NextCloud used 6.63TB of data. That year we measured ~830,000 unique visitors for a total of 11.16TB.

It seems NextCloud News is polling our feed every couple of minutes and also ignoring HTTP last-modified and etag headers. The frequency of polling itself also seems to fluctuate wildly.

See the below graph:

In case it is helpful here are few hours worth of anonymized web server logs

If you need any more info let me know!

1. to develop bug regression

All 6 comments

As far as I know it respects last-modified. Nextcloud-news isn't centralized though, anyone can run it. So while I'd love to help you I don't see any way to reduce data traffic.

As far as I know it respects last-modified. Nextcloud-news isn't centralized though, anyone can run it

How does it do that though? As far as I understand it, doing would require first sending a HEAD request to see if the content is actually modified compared to what you already have locally and only then doing a GET request.

In the logs attached to the original post I only see GET requests.

I would expect to see something like this though:

 [09/Dec/2018:21:19:11 +0100] "HEAD /feeds/all.rss.xml HTTP/1.1" 200 0 "-" " QuiteRSS/0.18.4 Safari/538.1"

 [09/Dec/2018:21:19:13 +0100] "GET /feeds/all.rss.xml HTTP/1.1" 200 859131 "-" "QuiteRSS/0.18.4 Safari/538.1"

This RSS reader first checks whether the content is modified using a HEAD request during which no content is transmitted. It then follows up with a GET request and the content is transmitted.

I understand NextCloud-News is used decentrally and people are free to tweak it but it is developed centrally, so setting good defaults would have a lot of impact.

That's not actually something in the code of Nextcloud news though, that's in a library that gets used. https://github.com/alexdebril/feed-io/blob/master/src/FeedIo/Adapter/Guzzle/Client.php has the exact implementation.

Ok thanks I will open an issue with them as well!

Thanks for reporting and cool project :+1:

How does it do that though? As far as I understand it, doing would require first sending a HEAD request to see if the content is actually modified compared to what you already have locally and only then doing a GET request.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Modified-Since we use a different way. With each request last modified is sent to your server. The server will return 304 (if nothing new) or 200 (if document changed). 304 has usually the length 0. That is unfortunately broken. Should be fixed by #594.

Found this by accident when I was looking for the feed url. Not related to this issue.

image

The <link> element does not need a closing element.

Thanks for looking into it and patching! I'll have a look at the RSS feed!

Was this page helpful?
0 / 5 - 0 ratings