Full text scraping doesn't work anymore since the switch from picoFeed to feed-io.
With picoFeed, if the option "Enable full text` was activated for a specific feed, I could read the whole article, even if the full content was not included in the feed. Since feed-io (I guess since then, even if I'm not sure about the exact time when I noticed it the first time) full-content scraping doesn't work anymore.
Maybe this is related: https://github.com/alexdebril/feed-io/issues/211
The full text feature requires parsing of the actual website which is quite complicated.
Full text was only re-enabled with 13.1.5 for feeds that actually contain the whole text.
Maybe the wallabag parser could help? Developed by @j0k3r
https://github.com/j0k3r/graby
I was looking for ways to do this as well for the bookmarks app, once. I abandoned that idea, because I thought, the news app already does this quite well and it might only be needed for read-it-later style bookmarking, now I'm reconsidering. In any case, these were the libraries that I was considering ;)
https://github.com/nextcloud/bookmarks/issues/438#issuecomment-364756929
@marcelklehr Why do you think that graby is of questionable quality?
https://github.com/j0k3r/graby (do-it-all grab bag of questionable quality)
I use it with wallabag and I'm very satisfied :)
@Grotax Do you think it's doable with one of the libraries mentioned by @marcelklehr ? I'm available to test if needed (no coding skills unfortunately).
And by the way, I like your new features of grouping the news depending on the original website: I guess it's also related to the switch to feed-io?
I didn't check it yet. And even though I'm the official maintainer in the app store and did some changes. I'm not a PHP expert. So I haven't decided if I'm going to implement this.
But I'm definitely interested :)
Showing full text requires site-specific parsing strings. Luckily, full-text-rss (AGPL v3) does precisely that, and @fivefilters even offers an abundance of site-specific config files: ftr-site-config (public domain).
I don't want to implement something that's always outdated unless you pay though so that won't be the one we use.
I actually did start working on this but wasn't satisfied with the current lib versions so I guess this will take more time.
For the record I am pretty satisfied with how most of the content makes it to the app. There are some few blogs that don't give you the full text without scraping, and then there's the likes of reddit where you always need to click on the link. Only for those it would be nice to have this feature but it's not a dealbreaker as we already get the full text for most sites.
To say my opinion too, on my 9 sites I follow using RSS, without this functionnality, only 3 are okay.
Others only show a short text :/
You should complain to the authors then. This after all is a news reader, not a news scraper. We shouldn't work around restrictions provided by sources but convince them to be less restrictive for the benefit of all users.
I mean, would it be awesome to have? yes
Is it a fundamental part of a RSS reader? no
It would be great, but we have to be aware of the limitations of small open source products, and this is not trivial to implement.
I just realized that "Enable full text" doesn't do anything right now. What is it supposed to change? Should we switch from description and content sections of the feed?
Right now nothing changes. In the past this enabled the scraper but at this moment there's nothing to scrape so it won't. There's also nothing we can display differently.
IMO "full text" option should be removed. Whenever (if) a scraper is available, I think it should just always use full text, and we simplify the UI.
I don't want to implement something that's always outdated unless you pay though so that won't be the one we use.
Wallabag is using https://github.com/j0k3r/graby and is updated. It's there for years, and the result is super satisfying. It is based on fivefilters mentioned by @danielrheinbay and it improves this solution.
I might check graby again after the 2.0 release.
Current version seems outdated.
I tried the current state of the 2.0 version today and it worked perfectly, all feeds were added completely to the database.
I would go ahead and tidy up the code and make use of the full-text option to allow enabling it for different feeds. Otherwise I would suggest that it tries to fetch if there is no other text supplied by the feed directly.
I'm currently working on heaving testing the 2.0 to ensure it runs smoothly, like the 1.x releases.
Anybody who is interested can check out my branch: here.
I already use it in my setup because most of my feeds require me to click again and open something in a browser which is really annoying for me (apparently I'm not the only one).
Please let me know how to proceed to merge this at some point into the app.
Wouldn't it be better to tell the author of the feed that it's current practice is annoying?
Wouldn't it be better to tell the author of the feed that it's current practice is annoying?
In an ideal world, yes, but realistically: good luck with that. ;)
The diff from @powerpaul17 seems quite reasonable https://github.com/nextcloud/news/compare/nextcloud:master...powerpaul17:full_text_scraping
Full text scrapping was a very nice feature, it would be sad to see this thrown away.
Wouldn't it be better to tell the author of the feed that it's current practice is annoying?
Wouldn't it be annoying to contact every feed's author instead of having a piece of code that can do (even partially) the job? :)
It would be, but you'd be fixing it for everyone and not just yourself. So you get to feel good about it.
Either way, whoever implements it gets to fix it for everyone every time it's broken. I'm not maintaining it.
Please don't get this the wrong way, but I really do not understand what is the problem with being able to activate/deactivate full text fetching on a per feed basis, you just don't have to enable it if you don't want it.
I simply want to read my news feeds anywhere even without my internet connection and without having to open a browser every time I switch to the next article.
Either way, whoever implements it gets to fix it for everyone every time it's broken. I'm not maintaining it.
So finally I understand the real reason for your strong resistance against this feature.. ;) Anyway I just find it kind of sad, because devs are always telling: if you want that feature, why do you not go and implement it yourself, we don't have time, and if someone does, it is not welcome and leaves no other way than a fork (which I have running on my system).. instead of combining forces together to make an awesome project.
I'm fine with anyone contributing to the project. I'm fine with this feature being re-added by someone else into the project. Just a heads up that any issues that arise from that will not be picked up by me because I think it's the wrong thing to do.
Aaand im going to lock this conversation I think everyone was able to express his opinion.
You can always create a PR and it will be considered.
Keep in mind that maintaining this app is not a full time job and therefore has a low priority.
Most helpful comment
I didn't check it yet. And even though I'm the official maintainer in the app store and did some changes. I'm not a PHP expert. So I haven't decided if I'm going to implement this.
But I'm definitely interested :)