News: Full text feature broken

Created on 14 Mar 2019 · 37Comments · Source: nextcloud/news

[x ] I have read the CONTRIBUTING.md and followed the provided tips

Explain the Problem

Full text feature broken. Only short articles are shown. Full article feature does not have any effect.

Steps to Reproduce

Explain what you did to encounter the issue

Disable full text article in feed if already activated
Enable full text article in feed
Reload feed
Same content is shown (no full text)

System Information

News app version: 13.1.1
Nextcloud version: 15.0.5
PHP version: 7.0
Database and version: MariaDB 5.6
Browser and version: Latest Chrome, Latest Firefox
Distribution and version:

No related errors in logs

1. to develop regression

Source

mistermtu

👍17

Most helpful comment

I generally like much more being able to read the whole article without opening a browser tab. This is specially nice when reading from the Android app.

Maybe this can be made a configuration option. Thoughts?

nachoparker on 24 Mar 2019

👍4

All 37 comments

It shows

Internal server error! Please check your data/nextcloud.log file for additional information!

when I try to use the full text feature

The log contains the following:

matdb on 14 Mar 2019

@matdb it'll just be ignored. Your error is from a feed returning an invalid date as far as I can tell.

SMillerDev on 14 Mar 2019

+1 Many of my feeds are no longer displaying full text.

{"reqId":"eS0R1LFno8L54wOB73vM","level":3,"time":"2019-03-15T09:14:33+00:00","remoteAddr":"*******","user":"*********","app":"index","method":"PATCH","url":"\/index.php\/apps\/news\/feeds\/70","message":{"Exception":"Exception","Message":"DateTime::__construct(): Failed to parse time string (0) at position 0 (0): Unexpected character","Code":0,"Trace":[{"file":"\/var\/www\/html\/apps\/news\/lib\/Fetcher\/FeedFetcher.php","line":75,"function":"__construct","class":"DateTime","type":"->","args":["0"]},{"file":"\/var\/www\/html\/apps\/news\/lib\/Fetcher\/Fetcher.php","line":68,"function":"fetch","class":"OCA\\News\\Fetcher\\FeedFetcher","type":"->","args":["https:\/\/www.theregister.co.uk\/headlines.atom",false,"0",null,null]},{"file":"\/var\/www\/html\/apps\/news\/lib\/Service\/FeedService.php","line":228,"function":"fetch","class":"OCA\\News\\Fetcher\\Fetcher","type":"->","args":["https:\/\/www.theregister.co.uk\/headlines.atom",false,"0",null,null]},{"file":"\/var\/www\/html\/apps\/news\/lib\/Service\/FeedService.php","line":490,"function":"update","class":"OCA\\News\\Service\\FeedService","type":"->","args":[70,"********",true]},{"file":"\/var\/www\/html\/apps\/news\/lib\/Controller\/FeedController.php","line":322,"function":"patch","class":"OCA\\News\\Service\\FeedService","type":"->","args":[70,"*********",{"fullTextEnabled":true}]},{"file":"\/var\/www\/html\/lib\/private\/AppFramework\/Http\/Dispatcher.php","line":166,"function":"patch","class":"OCA\\News\\Controller\\FeedController","type":"->","args":[70,null,true,null,null,null,null]},{"file":"\/var\/www\/html\/lib\/private\/AppFramework\/Http\/Dispatcher.php","line":99,"function":"executeController","class":"OC\\AppFramework\\Http\\Dispatcher","type":"->","args":[{"__class__":"OCA\\News\\Controller\\FeedController"},"patch"]},{"file":"\/var\/www\/html\/lib\/private\/AppFramework\/App.php","line":118,"function":"dispatch","class":"OC\\AppFramework\\Http\\Dispatcher","type":"->","args":[{"__class__":"OCA\\News\\Controller\\FeedController"},"patch"]},{"file":"\/var\/www\/html\/lib\/private\/AppFramework\/Routing\/RouteActionHandler.php","line":47,"function":"main","class":"OC\\AppFramework\\App","type":"::","args":["OCA\\News\\Controller\\FeedController","patch",{"__class__":"OC\\AppFramework\\DependencyInjection\\DIContainer"},{"feedId":"70","_route":"news.feed.patch"}]},{"function":"__invoke","class":"OC\\AppFramework\\Routing\\RouteActionHandler","type":"->","args":[{"feedId":"70","_route":"news.feed.patch"}]},{"file":"\/var\/www\/html\/lib\/private\/Route\/Router.php","line":297,"function":"call_user_func","args":[{"__class__":"OC\\AppFramework\\Routing\\RouteActionHandler"},{"feedId":"70","_route":"news.feed.patch"}]},{"file":"\/var\/www\/html\/lib\/base.php","line":987,"function":"match","class":"OC\\Route\\Router","type":"->","args":["\/apps\/news\/feeds\/70"]},{"file":"\/var\/www\/html\/index.php","line":42,"function":"handleRequest","class":"OC","type":"::","args":[]}],"File":"\/var\/www\/html\/apps\/news\/lib\/Fetcher\/FeedFetcher.php","Line":75,"CustomMessage":"--"},"userAgent":"Mozilla\/5.0 (X11; Linux x86_64; rv:60.0) Gecko\/20100101 Firefox\/60.0","version":"15.0.5.3"}

megamaced on 15 Mar 2019

I can confirm that: Same issue with the following two error messages:

{"reqId":"XIycF38AAAEAAEs-cuwAAAAA","level":3,"time":"2019-03-16T06:47:52+00:00","remoteAddr":"","user":"","app":"PHP","method":"PATCH","url":"/index.php/apps/news/feeds/6","message":"DateTime::__construct(): Failed to parse time string (0) at position 0 (0): Unexpected character at /var/www/nextcloud/apps/news/lib/Fetcher/FeedFetcher.php#75","userAgent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Firefox/60.0","version":"15.0.5.3","id":"5c8c9ce8da42b"}

{"reqId":"XIycF38AAAEAAEs-cuwAAAAA","level":3,"time":"2019-03-16T06:47:52+00:00","remoteAddr":"","user":"","app":"index","method":"PATCH","url":"/index.php/apps/news/feeds/6","message":{"Exception":"Exception","Message":"DateTime::__construct(): Failed to parse time string (0) at position 0 (0): Unexpected character","Code":0,"Trace":[{"file":"/var/www/nextcloud/apps/news/lib/Fetcher/FeedFetcher.php","line":75,"function":"__construct","class":"DateTime","type":"->","args":["0"]},{"file":"/var/www/nextcloud/apps/news/lib/Fetcher/Fetcher.php","line":68,"function":"fetch","class":"OCA\News\Fetcher\FeedFetcher","type":"->","args":["http://feeds2.feedburner.com/stadt-bremerhaven/dqXM",false,"0",null,null]},{"file":"/var/www/nextcloud/apps/news/lib/Service/FeedService.php","line":228,"function":"fetch","class":"OCA\News\Fetcher\Fetcher","type":"->","args":["http://feeds2.feedburner.com/stadt-bremerhaven/dqXM",false,"0",null,null]},{"file":"/var/www/nextcloud/apps/news/lib/Service/FeedService.php","line":490,"function":"update","class":"OCA\News\Service\FeedService","type":"->","args":[6,"markus",true]},{"file":"/var/www/nextcloud/apps/news/lib/Controller/FeedController.php","line":322,"function":"patch","class":"OCA\News\Service\FeedService","type":"->","args":[6,"markus",{"fullTextEnabled":true}]},{"file":"/var/www/nextcloud/lib/private/AppFramework/Http/Dispatcher.php","line":166,"function":"patch","class":"OCA\News\Controller\FeedController","type":"->","args":[6,null,true,null,null,null,null]},{"file":"/var/www/nextcloud/lib/private/AppFramework/Http/Dispatcher.php","line":99,"function":"executeController","class":"OC\AppFramework\Http\Dispatcher","type":"->","args":[{"__class__":"OCA\News\Controller\FeedController"},"patch"]},{"file":"/var/www/nextcloud/lib/private/AppFramework/App.php","line":118,"function":"dispatch","class":"OC\AppFramework\Http\Dispatcher","type":"->","args":[{"__class__":"OCA\News\Controller\FeedController"},"patch"]},{"file":"/var/www/nextcloud/lib/private/AppFramework/Routing/RouteActionHandler.php","line":47,"function":"main","class":"OC\AppFramework\App","type":"::","args":["OCA\News\Controller\FeedController","patch",{"__class__":"OC\AppFramework\DependencyInjection\DIContainer"},{"feedId":"6","_route":"news.feed.patch"}]},{"function":"__invoke","class":"OC\AppFramework\Routing\RouteActionHandler","type":"->","args":[{"feedId":"6","_route":"news.feed.patch"}]},{"file":"/var/www/nextcloud/lib/private/Route/Router.php","line":297,"function":"call_user_func","args":[{"__class__":"OC\AppFramework\Routing\RouteActionHandler"},{"feedId":"6","_route":"news.feed.patch"}]},{"file":"/var/www/nextcloud/lib/base.php","line":987,"function":"match","class":"OC\Route\Router","type":"->","args":["/apps/news/feeds/6"]},{"file":"/var/www/nextcloud/index.php","line":42,"function":"handleRequest","class":"OC","type":"::","args":[]}],"File":"/var/www/nextcloud/apps/news/lib/Fetcher/FeedFetcher.php","Line":75,"CustomMessage":"--"},"userAgent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Firefox/60.0","version":"15.0.5.3","id":"5c8c9ce8daac4"}

wwwindisch on 16 Mar 2019

That's more than enough confirmation, thanks everyone. The most useful thing to do if you have the same issue is making a pull request. Otherwise you can point out the offending line or feed. If you just want people to know you have the same issue, give it a thumbs up but please don't post the same error log 25 times saying you can confirm the issue. It only makes it increasingly difficult to find the right information.

SMillerDev on 16 Mar 2019

👍2

Not really the same error - or maybe same but different location: on manual update of the feed with:

$ sudo -u nginx php occ news:updater:update-feed 124 user
Could not update feed with id 124 and user user: DateTime::__construct(): Failed to parse time string (0) at position 0 (0): Unexpected character

The issue can not be fixed by manually disabling the fulltext collumn in the database entry - the issue persists. But before having clicked on the fulltext feature the feed worked well... and with another user that same feed works too until this user clicks the fulltext feature.

Feeds that do this are:

https://www.heise.de/rss/heise-atom.xml
https://www.computerbase.de/rss/news.xml

I'm not into php so no pr from me :(

enaut on 18 Mar 2019

Does the latest 13.1.4 release fix this?

Nextcloud news is a killer app for me, I use it all the time and I had to roll back to 13.0.3. Is 13.1.4 good to update?

Keep up the awesome work!

megamaced on 23 Mar 2019

I can confirm that at least for me the feeds that did not work with previous versions do work now... so it might be worth a shot to test...
However I do not see the full text of the article. its still just the summary. I do not know if I should see the full article. - I never understood the full text feature really.

enaut on 23 Mar 2019

Full text isn't implemented yet. Scraping feeds is difficult so we're trying to get just reading them right first. After I'll evaluate if I want to remove fulltext or try writing a scraper.

SMillerDev on 23 Mar 2019

what do you mean by scraping? Do you mean scraping the whole rss content or an extra feature other readers have that scrapes the whole article on the main website. I don't miss this extra feature that you would need for feeds like heise.de oder spiegel.de.
If i compare what the news app shows with the original feed it only shows the part of the description tag. The content tag of feeds that put their whole article in rss is not 'scraped'. For example deskmodder.de:
https://www.deskmodder.de/blog/feed/

qezzo on 23 Mar 2019

The fulltext feature means that it used a scraper to fetch the page the item referred to.

SMillerDev on 23 Mar 2019

so not showing the full article of the feed is another bug?

qezzo on 23 Mar 2019

There isn't actually anything in the rss standard about having the full article in the feed.

SMillerDev on 23 Mar 2019

But an ATOM feed can. According to https://validator.w3.org/feed/docs/atom.html#contentElement

either contains, or links to, the complete content of the entry.

qezzo on 23 Mar 2019

There isn't actually anything in the rss standard about having the full article in the feed.

But full text works great in 13.0.3 and any other RSS reader since the days of Google Reader.

megamaced on 23 Mar 2019

👍3

I generally like much more being able to read the whole article without opening a browser tab. This is specially nice when reading from the Android app.

Maybe this can be made a configuration option. Thoughts?

nachoparker on 24 Mar 2019

👍4

I get why people like the feature. But it's also very technically complex. And in my idea there's a reason the provider doesn't put the whole content in the feed, they want you to visit their site.

SMillerDev on 24 Mar 2019

Since version 13.1 the news app does not show the full article even if the whole content is part of the feed. It only shows the part in the description element.

qezzo on 24 Mar 2019

Previous versions didn't read that element either. They just scraped the linked website.

SMillerDev on 24 Mar 2019

is there a easy way to downgrade? i did make a backup for the nextcloud 15 upgrade but i did not do a backup for a minor (i guessed it was minor but looks like it did major changes) news app update.

News Database is not important, i can create the feed url backup.

nille02 on 24 Mar 2019

Can you share a link to the commit that changes this behavior? I understand this happened as part of getting rid of picofeed

nachoparker on 24 Mar 2019

Previous versions didn't read that element either. They just scraped the linked website.

Is it possible to have News read that element rather then scraping so we get the full text if it is provided in the feed?

megamaced on 25 Mar 2019

@nachoparker https://github.com/nextcloud/news/commit/a3246a927de542e1b3ab403359bfd3c08705b6a7 and the PR https://github.com/nextcloud/news/pull/282

Grotax on 25 Mar 2019

👍1

@megamaced if the documentation is right the content field is evaluated when you call getDescription on a atom feed. The feed you provided seems to be a mix of rss and atom.

Grotax on 25 Mar 2019

It seems that wordpress likes to do that. The feed contains a description element which is defined by rss and contains a short description of the item. And a content element defined by atom that has the full text.

Grotax on 25 Mar 2019

@Grotax if that is the case, it sounds like it should be possible to save the content field.

Is that is going to be implemented?

nachoparker on 26 Mar 2019

Depends on feed-io and probably only with the switch to 4.x

Grotax on 26 Mar 2019

Depends on feed-io and probably only with the switch to 4.x

Sorry, I am a bit lost about the context here. Can you explain the situation in two lines? are we switching to a new version of feed-io? does the current version support full text? does 4.x do?

Thanks!

nachoparker on 26 Mar 2019

feed-io is the feedparser and currently there is no support for content and description tags. My guess is that the parser chooses the description tag and ignores content. We are currently using the legacy version 3.0 because its the last version that supports php 7.0

New features for feed-io will probably only land in 4.x because of limited resources.

General fulltext support is another thing that would require us to scrape the website content

Grotax on 26 Mar 2019

Thanks for explaining. Maybe then we should ask in their issue tracker to know if this is in the roadmap or at least make them aware of it?

nachoparker on 26 Mar 2019

👍1

Don't worry its on my internal todo list.

Grotax on 26 Mar 2019

👍2 🎉1

After some more investigation the content tag is actually a extension which are called modules in the rss 1.0 specification. http://web.resource.org/rss/1.0/modules/content/

Grotax on 26 Mar 2019

After some more investigation the content tag is actually a extension which are called modules in the rss 1.0 specification. http://web.resource.org/rss/1.0/modules/content/

That's not actually part of the spec. "This section is a draft and has not yet been approved by the WG."
I'd be fine with attempting to read the field (I'd prefer it over scraping). But it's hardly a standard.

SMillerDev on 26 Mar 2019

I didn't actually want to close this but I guess part of it is fixed now.

The actual "scrape websites for full text"-feature is not implemented yet. You may create a new issue if interested or even better start a PR ;)

Grotax on 28 Mar 2019

Thank you guys for all the fixes! Just two quick question:

As the full text feature is not implemented, does the option have any effect? If I want to see the content off the content tag should I enable or disable the "Full text" option?
Also slightly OT: The OP of this issue writes "3. Reload feed". How do I do that? I don't find an option like that in the context menu of the feeds. If I do a reload which posts are fetched/scraped new? Only new ones or all that are still in the freshly downloaded rss document?

maralorn on 28 Mar 2019

No, feeds that contain a description and a content tag will always show the content tag, its basically replacing the description
I think if you toggle the full-text feature your unread items will be renewed. Normally existing feed items don't get re-parsed/fetched on an app update.

Grotax on 28 Mar 2019

👍1

Thank you for fixing it. Even without Full Text the Content is now again like with the older feed reader. Images are also Back now.

nille02 on 29 Mar 2019

👍3

Was this page helpful?

0 / 5 - 0 ratings