Freshrss: full article content

Created on 5 Feb 2015 · 10Comments · Source: FreshRSS/FreshRSS

Hi,
i found it great application..... but does it support full article content extraction?

Thanks

Documentation help wanted

Source

Fshamri

👀1 👍1

Most helpful comment

Running that comment through _Google Translate_ yields that it's actually a CSS selector that's used to extract the content from the page that the article links to.

I've taken a little look at it, but it appears changing it only works on new articles - so I'll have to wait and see if I'm right :P

Edit: Yep! It does appear to work as I've described above :D

sbrl on 14 Nov 2018

🎉1 👍1

All 10 comments

Hi,

Yes it does! The process is described in https://github.com/FreshRSS/FreshRSS/issues/199#issuecomment-26408922 but it is in french ; should be the time to translate it!

Waiting for someone to do so, you can play with the setting "Articles CSS path on original website" at the bottom of the configuration of a feed, if you have some knowledge you might be able to manage making it working :)

Alwaysin on 5 Feb 2015

There is an other option which is to use rss-bridge project. This way, you can separate your concerns.
That's what I do for some feeds.
I think it is more flexible

aledeg on 9 Feb 2015

👍2

Or also http://code.fivefilters.org/full-text-rss

Alwaysin on 9 Feb 2015

Running that comment through _Google Translate_ yields that it's actually a CSS selector that's used to extract the content from the page that the article links to.

I've taken a little look at it, but it appears changing it only works on new articles - so I'll have to wait and see if I'm right :P

Edit: Yep! It does appear to work as I've described above :D

sbrl on 14 Nov 2018

🎉1 👍1

Flym is one of the better-known RSS readers on Android, and I noticed it does a pretty good job of fetching full articles without any extra configuration, which would be pretty near impossible anyway for a source such as Google News that links to all sorts of publications. It's in these two pieces of code that basically apply a whole bunch of regexes:
https://github.com/FredJul/Flym/blob/master/app/src/main/java/net/frju/flym/service/FetcherService.kt (function mobilizeAllEntries()) and
https://github.com/FredJul/Flym/blob/master/app/src/main/java/net/frju/flym/utils/HtmlUtils.kt (function improveHtmlContent()).
Not pretty but fairly effective. I might try porting that, what do you think?

mbe-financial-com on 27 Apr 2020

👍1

That first one doesn't really seem to do anything at a glance, it just passes it on to Readability4JExtended()? An equivalent library will doubtless be somewhere in Wallabag, possibly too tightly integrated, as well as here (andreskrey/readability.php).

Frenzie on 27 Apr 2020

@mbe-financial-com Our current way to fetch full content is pretty obsolete, so new possibilities would be much welcome. As @Frenzie writes, we need to clarify whether those methods depend on third-party services or not, and we could even offer more than one option

Alkarex on 27 Apr 2020

Sounds like a readability thing you're talking about @mbe-financial-com. Firefox has a reader view built in - perhaps the code there could be used as a base? or maybe someone has already done this and published on packagist?

sbrl on 30 Apr 2020

I am migrating from Tiny Tiny RSS and was missing this feature. I built an extension that uses Mercury Parser. You'll need to set up their server for this, since the extension simply uses the Mercury Parser API.

You can find it here, if you're interested: https://github.com/simon-wessel/freshrss-mercury-parser