Freshrss: full article content

Created on 5 Feb 2015  路  10Comments  路  Source: FreshRSS/FreshRSS

Hi,
i found it great application..... but does it support full article content extraction?

Thanks

Documentation help wanted

Most helpful comment

Running that comment through _Google Translate_ yields that it's actually a CSS selector that's used to extract the content from the page that the article links to.

I've taken a little look at it, but it appears changing it only works on new articles - so I'll have to wait and see if I'm right :P

Edit: Yep! It does appear to work as I've described above :D

All 10 comments

Hi,

Yes it does! The process is described in https://github.com/FreshRSS/FreshRSS/issues/199#issuecomment-26408922 but it is in french ; should be the time to translate it!

Waiting for someone to do so, you can play with the setting "Articles CSS path on original website" at the bottom of the configuration of a feed, if you have some knowledge you might be able to manage making it working :)

There is an other option which is to use rss-bridge project. This way, you can separate your concerns.
That's what I do for some feeds.
I think it is more flexible

Running that comment through _Google Translate_ yields that it's actually a CSS selector that's used to extract the content from the page that the article links to.

I've taken a little look at it, but it appears changing it only works on new articles - so I'll have to wait and see if I'm right :P

Edit: Yep! It does appear to work as I've described above :D

Flym is one of the better-known RSS readers on Android, and I noticed it does a pretty good job of fetching full articles without any extra configuration, which would be pretty near impossible anyway for a source such as Google News that links to all sorts of publications. It's in these two pieces of code that basically apply a whole bunch of regexes:
https://github.com/FredJul/Flym/blob/master/app/src/main/java/net/frju/flym/service/FetcherService.kt (function mobilizeAllEntries()) and
https://github.com/FredJul/Flym/blob/master/app/src/main/java/net/frju/flym/utils/HtmlUtils.kt (function improveHtmlContent()).
Not pretty but fairly effective. I might try porting that, what do you think?

That first one doesn't really seem to do anything at a glance, it just passes it on to Readability4JExtended()? An equivalent library will doubtless be somewhere in Wallabag, possibly too tightly integrated, as well as here (andreskrey/readability.php).

@mbe-financial-com Our current way to fetch full content is pretty obsolete, so new possibilities would be much welcome. As @Frenzie writes, we need to clarify whether those methods depend on third-party services or not, and we could even offer more than one option

Sounds like a readability thing you're talking about @mbe-financial-com. Firefox has a reader view built in - perhaps the code there could be used as a base? or maybe someone has already done this and published on packagist?

I am migrating from Tiny Tiny RSS and was missing this feature. I built an extension that uses Mercury Parser. You'll need to set up their server for this, since the extension simply uses the Mercury Parser API.

You can find it here, if you're interested: https://github.com/simon-wessel/freshrss-mercury-parser

@Alkarex You've added the tag documentation on that issue. What needs to be done so we can close it?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

eminphi picture eminphi  路  5Comments

Tealk picture Tealk  路  5Comments

Alkarex picture Alkarex  路  5Comments

javerous picture javerous  路  5Comments

Paxistatis picture Paxistatis  路  3Comments