Newsboat version
System: Darwin 18.7.0 (x86_64)
Compiler: g++ 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.46.4)
ncurses: ncurses 5.7.20081102 (compiled with 5.7)
libcurl: libcurl/7.54.0 LibreSSL/2.6.5 zlib/1.2.11 nghttp2/1.24.1 (compiled with 7.54.0)
SQLite: 3.24.0 (compiled with 3.24.0)
libxml2: compiled with 2.9.4
I've been trying to configure ignoring/highlighting articles based on regular expressions without success.
For example /\bgit\b/i should match any appearance of the word "git", but not when it's part of a bigger word like digital.
But neither of this work and I can't figure out why :/
ignore-article "*" "title =~ \"/\bfacebook\b/i\""
highlight-article "title =~ \"/\bgit\b/i\"" white blue bold
Could you either help me with the regular expression or point me to any resource that I can follow to get it right? Thanks!
Hi! Yeah, regexes in Newsboat are under-documented. These should work:
ignore-article "*" "title =~ \"\\bfacebook\\b\""
highlight-article "title =~ \"\\bgit\\b\"" white blue bold
Specifically:
=~ operator always expects a regex;Note that ignore-article won't do anything to the articles you already fetched unless you change ignore-mode to display.
Does that answer your question?
Thanks for your quick response 馃槂!
I'm ashamed I didn't realize the scaping char was \ instead of / 馃う鈥嶁檪, great catch!
Still, I tried those but the \b word boundary matcher is not being interpreted, because neither article ignoring/highlighting work 馃槥.
I used the config you mentioned:
highlight-article "title =~ \"\\bgit\\b\"" white blue bold
I wrote a test feed with some positive/negative scenarios. You can use the gist's raw url at your urls file.
https://gist.githubusercontent.com/bertocq/2829d9ac0c519cfd4fe7ed8a06b60e5b/raw/b619837a65420fbca41a7e70dfc419be4c06448d/hightlight-article_title_words_text.xml
Matching titles (should be highlighted)
"git" is cool
Is .git, not .gut
Git.js
.git
Learn git
Some git tricks
Git
Git got digitalized
Not matching titles (should not be highlighted)
Digit
Digital
Gittern
Thinking that the solution could be again obvious, I've tried to research a bit and learned:
\b seems to be the right matcher for word boundary, based on regex 1.3.1 docs. I've tested it with echo "Learn git" | grep -e "\bgit\b" and echo "Digital" | grep -e "\bgit\b"\< and \> like highlight-article "title =~ \"\\<git\\>\"" white blue bold but it didn't work either.I ran Newsboat (current master, 2e696502ba5d6109b6257ac33c24cd777eaab7f4) with debug logging (--log-file=newsboat.log --log-level=6) and found that \\bgit\\b is understood as \bgitb (instead of \bgit\b). To make it work properly, I had to double-escape the second backslash:
highlight-article "title =~ \"\\bgit\\\\b\"" white blue bold
This reminds me of https://github.com/newsboat/newsboat/issues/536 ; I wonder if those issues are connected.
So you're doing everything right, @bertocq, it's just Newsboat being a bit broken =\ Re-tagging this as a bug. Thank you for the report!