News: Full text feature broken

Created on 14 Mar 2019  ·  37Comments  ·  Source: nextcloud/news

Explain the Problem

Full text feature broken. Only short articles are shown. Full article feature does not have any effect.

Steps to Reproduce

Explain what you did to encounter the issue

  1. Disable full text article in feed if already activated
  2. Enable full text article in feed
  3. Reload feed
  4. Same content is shown (no full text)

System Information

  • News app version: 13.1.1
  • Nextcloud version: 15.0.5
  • PHP version: 7.0
  • Database and version: MariaDB 5.6
  • Browser and version: Latest Chrome, Latest Firefox
  • Distribution and version:

No related errors in logs

1. to develop regression

Most helpful comment

I generally like much more being able to read the whole article without opening a browser tab. This is specially nice when reading from the Android app.

Maybe this can be made a configuration option. Thoughts?

All 37 comments

It shows

Internal server error! Please check your data/nextcloud.log file for additional information!

when I try to use the full text feature

The log contains the following:

{"reqId":"SLmus4uTDJYntUURtZhp","level":3,"time":"2019-03-14T08:16:07+00:00","remoteAddr":"**********","user":"********","app":"PHP","method":"PATCH","url":"\/index.php\/feeds\/49","message":"DateTime::__construct(): Failed to parse time string (0) at position 0 (0): Unexpected character at \/var\/www\/html\/custom_apps\/news\/lib\/Fetcher\/FeedFetcher.php#75","userAgent":"Mozilla\/5.0 (X11; Linux x86_64; rv:65.0) Gecko\/20100101 Firefox\/65.0","version":"15.0.5.3"} {"reqId":"e3lG8VlfxonktLpPwU6D","level":3,"time":"2019-03-14T08:16:15+00:00","remoteAddr":"*******","user":"******","app":"index","method":"PATCH","url":"\/index.php\/feeds\/99","message":{"Exception":"Exception","Message":"DateTime::__construct(): Failed to parse time string (0) at position 0 (0): Unexpected character","Code":0,"Trace":[{"file":"\/var\/www\/html\/custom_apps\/news\/lib\/Fetcher\/FeedFetcher.php","line":75,"function":"__construct","class":"DateTime","type":"->","args":["0"]},{"file":"\/var\/www\/html\/custom_apps\/news\/lib\/Fetcher\/Fetcher.php","line":68,"function":"fetch","class":"OCA\\News\\Fetcher\\FeedFetcher","type":"->","args":["https:\/\/sciencebasedmedicine.org\/feed\/",false,"0",null,null]},{"file":"\/var\/www\/html\/custom_apps\/news\/lib\/Service\/FeedService.php","line":228,"function":"fetch","class":"OCA\\News\\Fetcher\\Fetcher","type":"->","args":["https:\/\/sciencebasedmedicine.org\/feed\/",false,"0",null,null]},{"file":"\/var\/www\/html\/custom_apps\/news\/lib\/Service\/FeedService.php","line":490,"function":"update","class":"OCA\\News\\Service\\FeedService","type":"->","args":[99,"*******",true]},{"file":"\/var\/www\/html\/custom_apps\/news\/lib\/Controller\/FeedController.php","line":322,"function":"patch","class":"OCA\\News\\Service\\FeedService","type":"->","args":[99,"*******",{"fullTextEnabled":true}]},{"file":"\/var\/www\/html\/lib\/private\/AppFramework\/Http\/Dispatcher.php","line":166,"function":"patch","class":"OCA\\News\\Controller\\FeedController","type":"->","args":[99,null,true,null,null,null,null]},{"file":"\/var\/www\/html\/lib\/private\/AppFramework\/Http\/Dispatcher.php","line":99,"function":"executeController","class":"OC\\AppFramework\\Http\\Dispatcher","type":"->","args":[{"__class__":"OCA\\News\\Controller\\FeedController"},"patch"]},{"file":"\/var\/www\/html\/lib\/private\/AppFramework\/App.php","line":118,"function":"dispatch","class":"OC\\AppFramework\\Http\\Dispatcher","type":"->","args":[{"__class__":"OCA\\News\\Controller\\FeedController"},"patch"]},{"file":"\/var\/www\/html\/lib\/private\/AppFramework\/Routing\/RouteActionHandler.php","line":47,"function":"main","class":"OC\\AppFramework\\App","type":"::","args":["OCA\\News\\Controller\\FeedController","patch",{"__class__":"OC\\AppFramework\\DependencyInjection\\DIContainer"},{"feedId":"99","_route":"news.feed.patch"}]},{"function":"__invoke","class":"OC\\AppFramework\\Routing\\RouteActionHandler","type":"->","args":[{"feedId":"99","_route":"news.feed.patch"}]},{"file":"\/var\/www\/html\/lib\/private\/Route\/Router.php","line":297,"function":"call_user_func","args":[{"__class__":"OC\\AppFramework\\Routing\\RouteActionHandler"},{"feedId":"99","_route":"news.feed.patch"}]},{"file":"\/var\/www\/html\/lib\/base.php","line":987,"function":"match","class":"OC\\Route\\Router","type":"->","args":["\/apps\/news\/feeds\/99"]},{"file":"\/var\/www\/html\/index.php","line":42,"function":"handleRequest","class":"OC","type":"::","args":[]}],"File":"\/var\/www\/html\/custom_apps\/news\/lib\/Fetcher\/FeedFetcher.php","Line":75,"CustomMessage":"--"},"userAgent":"Mozilla\/5.0 (X11; Linux x86_64; rv:65.0) Gecko\/20100101 Firefox\/65.0","version":"15.0.5.3"} {"reqId":"e3lG8VlfxonktLpPwU6D","level":3,"time":"2019-03-14T08:16:15+00:00","remoteAddr":"*******","user":"*******","app":"PHP","method":"PATCH","url":"\/index.php\/feeds\/99","message":"DateTime::__construct(): Failed to parse time string (0) at position 0 (0): Unexpected character at \/var\/www\/html\/custom_apps\/news\/lib\/Fetcher\/FeedFetcher.php#75","userAgent":"Mozilla\/5.0 (X11; Linux x86_64; rv:65.0) Gecko\/20100101 Firefox\/65.0","version":"15.0.5.3"}

@matdb it'll just be ignored. Your error is from a feed returning an invalid date as far as I can tell.

+1 Many of my feeds are no longer displaying full text.

{"reqId":"eS0R1LFno8L54wOB73vM","level":3,"time":"2019-03-15T09:14:33+00:00","remoteAddr":"*******","user":"*********","app":"index","method":"PATCH","url":"\/index.php\/apps\/news\/feeds\/70","message":{"Exception":"Exception","Message":"DateTime::__construct(): Failed to parse time string (0) at position 0 (0): Unexpected character","Code":0,"Trace":[{"file":"\/var\/www\/html\/apps\/news\/lib\/Fetcher\/FeedFetcher.php","line":75,"function":"__construct","class":"DateTime","type":"->","args":["0"]},{"file":"\/var\/www\/html\/apps\/news\/lib\/Fetcher\/Fetcher.php","line":68,"function":"fetch","class":"OCA\\News\\Fetcher\\FeedFetcher","type":"->","args":["https:\/\/www.theregister.co.uk\/headlines.atom",false,"0",null,null]},{"file":"\/var\/www\/html\/apps\/news\/lib\/Service\/FeedService.php","line":228,"function":"fetch","class":"OCA\\News\\Fetcher\\Fetcher","type":"->","args":["https:\/\/www.theregister.co.uk\/headlines.atom",false,"0",null,null]},{"file":"\/var\/www\/html\/apps\/news\/lib\/Service\/FeedService.php","line":490,"function":"update","class":"OCA\\News\\Service\\FeedService","type":"->","args":[70,"********",true]},{"file":"\/var\/www\/html\/apps\/news\/lib\/Controller\/FeedController.php","line":322,"function":"patch","class":"OCA\\News\\Service\\FeedService","type":"->","args":[70,"*********",{"fullTextEnabled":true}]},{"file":"\/var\/www\/html\/lib\/private\/AppFramework\/Http\/Dispatcher.php","line":166,"function":"patch","class":"OCA\\News\\Controller\\FeedController","type":"->","args":[70,null,true,null,null,null,null]},{"file":"\/var\/www\/html\/lib\/private\/AppFramework\/Http\/Dispatcher.php","line":99,"function":"executeController","class":"OC\\AppFramework\\Http\\Dispatcher","type":"->","args":[{"__class__":"OCA\\News\\Controller\\FeedController"},"patch"]},{"file":"\/var\/www\/html\/lib\/private\/AppFramework\/App.php","line":118,"function":"dispatch","class":"OC\\AppFramework\\Http\\Dispatcher","type":"->","args":[{"__class__":"OCA\\News\\Controller\\FeedController"},"patch"]},{"file":"\/var\/www\/html\/lib\/private\/AppFramework\/Routing\/RouteActionHandler.php","line":47,"function":"main","class":"OC\\AppFramework\\App","type":"::","args":["OCA\\News\\Controller\\FeedController","patch",{"__class__":"OC\\AppFramework\\DependencyInjection\\DIContainer"},{"feedId":"70","_route":"news.feed.patch"}]},{"function":"__invoke","class":"OC\\AppFramework\\Routing\\RouteActionHandler","type":"->","args":[{"feedId":"70","_route":"news.feed.patch"}]},{"file":"\/var\/www\/html\/lib\/private\/Route\/Router.php","line":297,"function":"call_user_func","args":[{"__class__":"OC\\AppFramework\\Routing\\RouteActionHandler"},{"feedId":"70","_route":"news.feed.patch"}]},{"file":"\/var\/www\/html\/lib\/base.php","line":987,"function":"match","class":"OC\\Route\\Router","type":"->","args":["\/apps\/news\/feeds\/70"]},{"file":"\/var\/www\/html\/index.php","line":42,"function":"handleRequest","class":"OC","type":"::","args":[]}],"File":"\/var\/www\/html\/apps\/news\/lib\/Fetcher\/FeedFetcher.php","Line":75,"CustomMessage":"--"},"userAgent":"Mozilla\/5.0 (X11; Linux x86_64; rv:60.0) Gecko\/20100101 Firefox\/60.0","version":"15.0.5.3"}

I can confirm that: Same issue with the following two error messages:

{"reqId":"XIycF38AAAEAAEs-cuwAAAAA","level":3,"time":"2019-03-16T06:47:52+00:00","remoteAddr":"","user":"","app":"PHP","method":"PATCH","url":"/index.php/apps/news/feeds/6","message":"DateTime::__construct(): Failed to parse time string (0) at position 0 (0): Unexpected character at /var/www/nextcloud/apps/news/lib/Fetcher/FeedFetcher.php#75","userAgent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Firefox/60.0","version":"15.0.5.3","id":"5c8c9ce8da42b"}

{"reqId":"XIycF38AAAEAAEs-cuwAAAAA","level":3,"time":"2019-03-16T06:47:52+00:00","remoteAddr":"","user":"","app":"index","method":"PATCH","url":"/index.php/apps/news/feeds/6","message":{"Exception":"Exception","Message":"DateTime::__construct(): Failed to parse time string (0) at position 0 (0): Unexpected character","Code":0,"Trace":[{"file":"/var/www/nextcloud/apps/news/lib/Fetcher/FeedFetcher.php","line":75,"function":"__construct","class":"DateTime","type":"->","args":["0"]},{"file":"/var/www/nextcloud/apps/news/lib/Fetcher/Fetcher.php","line":68,"function":"fetch","class":"OCA\News\Fetcher\FeedFetcher","type":"->","args":["http://feeds2.feedburner.com/stadt-bremerhaven/dqXM",false,"0",null,null]},{"file":"/var/www/nextcloud/apps/news/lib/Service/FeedService.php","line":228,"function":"fetch","class":"OCA\News\Fetcher\Fetcher","type":"->","args":["http://feeds2.feedburner.com/stadt-bremerhaven/dqXM",false,"0",null,null]},{"file":"/var/www/nextcloud/apps/news/lib/Service/FeedService.php","line":490,"function":"update","class":"OCA\News\Service\FeedService","type":"->","args":[6,"markus",true]},{"file":"/var/www/nextcloud/apps/news/lib/Controller/FeedController.php","line":322,"function":"patch","class":"OCA\News\Service\FeedService","type":"->","args":[6,"markus",{"fullTextEnabled":true}]},{"file":"/var/www/nextcloud/lib/private/AppFramework/Http/Dispatcher.php","line":166,"function":"patch","class":"OCA\News\Controller\FeedController","type":"->","args":[6,null,true,null,null,null,null]},{"file":"/var/www/nextcloud/lib/private/AppFramework/Http/Dispatcher.php","line":99,"function":"executeController","class":"OC\AppFramework\Http\Dispatcher","type":"->","args":[{"__class__":"OCA\News\Controller\FeedController"},"patch"]},{"file":"/var/www/nextcloud/lib/private/AppFramework/App.php","line":118,"function":"dispatch","class":"OC\AppFramework\Http\Dispatcher","type":"->","args":[{"__class__":"OCA\News\Controller\FeedController"},"patch"]},{"file":"/var/www/nextcloud/lib/private/AppFramework/Routing/RouteActionHandler.php","line":47,"function":"main","class":"OC\AppFramework\App","type":"::","args":["OCA\News\Controller\FeedController","patch",{"__class__":"OC\AppFramework\DependencyInjection\DIContainer"},{"feedId":"6","_route":"news.feed.patch"}]},{"function":"__invoke","class":"OC\AppFramework\Routing\RouteActionHandler","type":"->","args":[{"feedId":"6","_route":"news.feed.patch"}]},{"file":"/var/www/nextcloud/lib/private/Route/Router.php","line":297,"function":"call_user_func","args":[{"__class__":"OC\AppFramework\Routing\RouteActionHandler"},{"feedId":"6","_route":"news.feed.patch"}]},{"file":"/var/www/nextcloud/lib/base.php","line":987,"function":"match","class":"OC\Route\Router","type":"->","args":["/apps/news/feeds/6"]},{"file":"/var/www/nextcloud/index.php","line":42,"function":"handleRequest","class":"OC","type":"::","args":[]}],"File":"/var/www/nextcloud/apps/news/lib/Fetcher/FeedFetcher.php","Line":75,"CustomMessage":"--"},"userAgent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Firefox/60.0","version":"15.0.5.3","id":"5c8c9ce8daac4"}

That's more than enough confirmation, thanks everyone. The most useful thing to do if you have the same issue is making a pull request. Otherwise you can point out the offending line or feed. If you just want people to know you have the same issue, give it a thumbs up but please don't post the same error log 25 times saying you can confirm the issue. It only makes it increasingly difficult to find the right information.

Not really the same error - or maybe same but different location: on manual update of the feed with:

$ sudo -u nginx php occ news:updater:update-feed 124 user
Could not update feed with id 124 and user user: DateTime::__construct(): Failed to parse time string (0) at position 0 (0): Unexpected character

The issue can not be fixed by manually disabling the fulltext collumn in the database entry - the issue persists. But before having clicked on the fulltext feature the feed worked well... and with another user that same feed works too until this user clicks the fulltext feature.

Feeds that do this are:

https://www.heise.de/rss/heise-atom.xml
https://www.computerbase.de/rss/news.xml

I'm not into php so no pr from me :(

Does the latest 13.1.4 release fix this?

Nextcloud news is a killer app for me, I use it all the time and I had to roll back to 13.0.3. Is 13.1.4 good to update?

Keep up the awesome work!

I can confirm that at least for me the feeds that did not work with previous versions do work now... so it might be worth a shot to test...
However I do not see the full text of the article. its still just the summary. I do not know if I should see the full article. - I never understood the full text feature really.

Full text isn't implemented yet. Scraping feeds is difficult so we're trying to get just reading them right first. After I'll evaluate if I want to remove fulltext or try writing a scraper.

what do you mean by scraping? Do you mean scraping the whole rss content or an extra feature other readers have that scrapes the whole article on the main website. I don't miss this extra feature that you would need for feeds like heise.de oder spiegel.de.
If i compare what the news app shows with the original feed it only shows the part of the description tag. The content tag of feeds that put their whole article in rss is not 'scraped'. For example deskmodder.de:
https://www.deskmodder.de/blog/feed/

The fulltext feature means that it used a scraper to fetch the page the item referred to.

so not showing the full article of the feed is another bug?

There isn't actually anything in the rss standard about having the full article in the feed.

But an ATOM feed can. According to https://validator.w3.org/feed/docs/atom.html#contentElement

either contains, or links to, the complete content of the entry.

There isn't actually anything in the rss standard about having the full article in the feed.

But full text works great in 13.0.3 and any other RSS reader since the days of Google Reader.

I generally like much more being able to read the whole article without opening a browser tab. This is specially nice when reading from the Android app.

Maybe this can be made a configuration option. Thoughts?

I get why people like the feature. But it's also very technically complex. And in my idea there's a reason the provider doesn't put the whole content in the feed, they want you to visit their site.

Since version 13.1 the news app does not show the full article even if the whole content is part of the feed. It only shows the part in the description element.

Previous versions didn't read that element either. They just scraped the linked website.

is there a easy way to downgrade? i did make a backup for the nextcloud 15 upgrade but i did not do a backup for a minor (i guessed it was minor but looks like it did major changes) news app update.

News Database is not important, i can create the feed url backup.

Can you share a link to the commit that changes this behavior? I understand this happened as part of getting rid of picofeed

Previous versions didn't read that element either. They just scraped the linked website.

Is it possible to have News read that element rather then scraping so we get the full text if it is provided in the feed?

@megamaced if the documentation is right the content field is evaluated when you call getDescription on a atom feed. The feed you provided seems to be a mix of rss and atom.

It seems that wordpress likes to do that. The feed contains a description element which is defined by rss and contains a short description of the item. And a content element defined by atom that has the full text.

@Grotax if that is the case, it sounds like it should be possible to save the content field.

Is that is going to be implemented?

Depends on feed-io and probably only with the switch to 4.x

Depends on feed-io and probably only with the switch to 4.x

Sorry, I am a bit lost about the context here. Can you explain the situation in two lines? are we switching to a new version of feed-io? does the current version support full text? does 4.x do?

Thanks!

feed-io is the feedparser and currently there is no support for content and description tags. My guess is that the parser chooses the description tag and ignores content. We are currently using the legacy version 3.0 because its the last version that supports php 7.0

New features for feed-io will probably only land in 4.x because of limited resources.

General fulltext support is another thing that would require us to scrape the website content

Thanks for explaining. Maybe then we should ask in their issue tracker to know if this is in the roadmap or at least make them aware of it?

Don't worry its on my internal todo list.

After some more investigation the content tag is actually a extension which are called modules in the rss 1.0 specification. http://web.resource.org/rss/1.0/modules/content/

After some more investigation the content tag is actually a extension which are called modules in the rss 1.0 specification. http://web.resource.org/rss/1.0/modules/content/

That's not actually part of the spec. "This section is a draft and has not yet been approved by the WG."
I'd be fine with attempting to read the field (I'd prefer it over scraping). But it's hardly a standard.

I didn't actually want to close this but I guess part of it is fixed now.

The actual "scrape websites for full text"-feature is not implemented yet. You may create a new issue if interested or even better start a PR ;)

Thank you guys for all the fixes! Just two quick question:

  • As the full text feature is not implemented, does the option have any effect? If I want to see the content off the content tag should I enable or disable the "Full text" option?
  • Also slightly OT: The OP of this issue writes "3. Reload feed". How do I do that? I don't find an option like that in the context menu of the feeds. If I do a reload which posts are fetched/scraped new? Only new ones or all that are still in the freshly downloaded rss document?
  • No, feeds that contain a description and a content tag will always show the content tag, its basically replacing the description
  • I think if you toggle the full-text feature your unread items will be renewed. Normally existing feed items don't get re-parsed/fetched on an app update.

Thank you for fixing it. Even without Full Text the Content is now again like with the older feed reader. Images are also Back now.

Was this page helpful?
0 / 5 - 0 ratings