Newsboat: Can't use `sed` in urls file

Created on 9 May 2020  Â·  4Comments  Â·  Source: newsboat/newsboat

Here is the content of an urls file:

"exec:curl --silent https://feeds.metaebene.me/raumzeit/m4a  | sed 's#\(</guid>\|</id>\)#-M4A&#'"    "~Raumzeit [M4A]"
"exec:curl --silent https://feeds.metaebene.me/raumzeit/mp3  | sed 's#\(</guid>\|</id>\)#-MP3&#'"    "~Raumzeit [MP3]"
"exec:curl --silent https://feeds.metaebene.me/raumzeit/oga  | sed 's#\(</guid>\|</id>\)#-Vorbis&#'" "~Raumzeit [Vorbis]"
"exec:curl --silent https://feeds.metaebene.me/raumzeit/opus | sed 's#\(</guid>\|</id>\)#-Opus&#'"   "~Raumzeit [Opus]"

As far as I know, as an uniq identifier, RSS 2.0 uses <guid> and Atom <id>. The sed command includes both. It simply inserts a string between the element's content and the respective end-tag.

_Originally posted by @mglh in https://github.com/newsboat/newsboat/issues/898#issuecomment-626187889_

@Minoru reproduced on current HEAD:

newsboat r2.19-262-gb188 - https://newsboat.org/
Copyright (C) 2006-2015 Andreas Krennmair
Copyright (C) 2015-2020 Alexander Batischev
Copyright (C) 2006-2017 Newsbeuter contributors
Copyright (C) 2017-2020 Newsboat contributors

Newsboat is free software licensed under the MIT License. (Type `./newsboat -vv' to see the full text.)
It bundles:

newsboat r2.19-262-gb188
System: Linux 5.6.0-1-amd64 (x86_64)
Compiler: g++ 9.3.0
ncurses: ncurses 6.2.20200212 (compiled with 6.2)
libcurl: libcurl/7.68.0 GnuTLS/3.6.13 zlib/1.2.11 brotli/1.0.7 libidn2/2.3.0 libpsl/0.21.0 (+libidn2/2.3.0) libssh2/1.8.0 nghttp2/1.40.0 librtmp/2.3 (compiled with 7.68.0)
SQLite: 3.31.1 (compiled with 3.31.1)
libxml2: compiled with 2.9.10

bug

All 4 comments

If I put this into 923-script.sh:

#!/bin/sh
url="$1"
suffix="$2"
curl --silent "${url}"  | sed "s#\\(</guid>\|</id>\\)#-${suffix}&#"

Then make the script executable and edit urls file like so:

"exec:./923-script.sh https://feeds.metaebene.me/raumzeit/m4a M4A"    "~Raumzeit [M4A]"
"exec:./923-script.sh https://feeds.metaebene.me/raumzeit/mp3 MP3"    "~Raumzeit [MP3]"
"exec:./923-script.sh https://feeds.metaebene.me/raumzeit/oga Vorbis" "~Raumzeit [Vorbis]"
"exec:./923-script.sh https://feeds.metaebene.me/raumzeit/opus Opus"   "~Raumzeit [Opus]"

Feeds update just fine, and I see each one contains 60 items even after I restart Newsboat.

Apparently Newsboat botches the URLs as it reads them from the urls file?

I have the suspicion that sed itself is not the problem here, instead that there may be a problem with "escaping" when urls is parsed.

This urls is working, in the sense of this issue and #898, too:

"exec:curl --silent https://feeds.metaebene.me/raumzeit/m4a  | sed 's#\\(<\\\\/guid>\\|<\\\\/id>\\)#-M4A&#'"    "~Raumzeit [M4A]"
"exec:curl --silent https://feeds.metaebene.me/raumzeit/mp3  | sed 's#\\(<\\\\/guid>\\|<\\\\/id>\\)#-MP3&#'"    "~Raumzeit [MP3]"
"exec:curl --silent https://feeds.metaebene.me/raumzeit/oga  | sed 's#\\(<\\\\/guid>\\|<\\\\/id>\\)#-Vorbis&#'" "~Raumzeit [Vorbis]"
"exec:curl --silent https://feeds.metaebene.me/raumzeit/opus | sed 's#\\(<\\\\/guid>\\|<\\\\/id>\\)#-Opus&#'"   "~Raumzeit [Opus]"

It seems, to work, four backslashes (instead of two) are needed, every second time an escaping takes place.

(The sample urls I posted first could not work anyway, since I hadn't used any escaping, which seems to be necessary when using exec:in urls. If this is not yet mentioned in the docs, perhaps it may be added.)

I have the suspicion that sed itself is not the problem here, instead that there may be a problem with "escaping" when urls is parsed.

Yup. utils::tokenize_quoted() is buggy. Working on it…

The quoting is necessary not because of exec, but because of double quotes: backslash is used as an escape character there, so it has to be escaped itself when one wants a literal backslash in there. I'll check if docs mention this.

Now that I think about it, utils::tokenize_quoted() also processes \n, \r, \t, and \v. I don't think it's useful in urls file, but I also wonder what I'll break if I take this mis-feature out.

Yup. utils::tokenize_quoted() is buggy.

four backslashes (instead of two) are needed, every second time an escaping takes place.

https://github.com/newsboat/newsboat/issues/642#issuecomment-529648918 might be related.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

crimsonskylark picture crimsonskylark  Â·  4Comments

suroa picture suroa  Â·  4Comments

Minoru picture Minoru  Â·  3Comments

Minoru picture Minoru  Â·  4Comments

garfieldnate picture garfieldnate  Â·  4Comments