Newsboat: How to use regular expression to highlight/ignore

Created on 7 Sep 2019  路  3Comments  路  Source: newsboat/newsboat

Newsboat version

System: Darwin 18.7.0 (x86_64)
Compiler: g++ 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.46.4)
ncurses: ncurses 5.7.20081102 (compiled with 5.7)
libcurl: libcurl/7.54.0 LibreSSL/2.6.5 zlib/1.2.11 nghttp2/1.24.1 (compiled with 7.54.0)
SQLite: 3.24.0 (compiled with 3.24.0)
libxml2: compiled with 2.9.4

I've been trying to configure ignoring/highlighting articles based on regular expressions without success.

For example /\bgit\b/i should match any appearance of the word "git", but not when it's part of a bigger word like digital.

But neither of this work and I can't figure out why :/

ignore-article "*" "title =~ \"/\bfacebook\b/i\""
highlight-article "title =~ \"/\bgit\b/i\"" white blue bold

Could you either help me with the regular expression or point me to any resource that I can follow to get it right? Thanks!

bug

All 3 comments

Hi! Yeah, regexes in Newsboat are under-documented. These should work:

ignore-article "*" "title =~ \"\\bfacebook\\b\""
highlight-article "title =~ \"\\bgit\\b\"" white blue bold

Specifically:

  • there is no need to put regexes in between slashes - they're already inside quotes, and =~ operator always expects a regex;
  • regexes are always matched case-insensitively in Newsboat, so "i" at the end is unnecessary;
  • since a regex is a string within a string, backslashes need to be escaped just like quotes.

Note that ignore-article won't do anything to the articles you already fetched unless you change ignore-mode to display.

Does that answer your question?

Thanks for your quick response 馃槂!

I'm ashamed I didn't realize the scaping char was \ instead of / 馃う鈥嶁檪, great catch!

Still, I tried those but the \b word boundary matcher is not being interpreted, because neither article ignoring/highlighting work 馃槥.

How I reproduce it:

Config

I used the config you mentioned:

highlight-article "title =~ \"\\bgit\\b\"" white blue bold

Feed

I wrote a test feed with some positive/negative scenarios. You can use the gist's raw url at your urls file.

https://gist.githubusercontent.com/bertocq/2829d9ac0c519cfd4fe7ed8a06b60e5b/raw/b619837a65420fbca41a7e70dfc419be4c06448d/hightlight-article_title_words_text.xml

Matching titles (should be highlighted)

"git" is cool
Is .git, not .gut
Git.js
.git
Learn git
Some git tricks
Git
Git got digitalized

Not matching titles (should not be highlighted)

Digit
Digital
Gittern

Findings

Thinking that the solution could be again obvious, I've tried to research a bit and learned:

  • Newsboat uses "Posix extended regex", from this issue
  • \b seems to be the right matcher for word boundary, based on regex 1.3.1 docs. I've tested it with echo "Learn git" | grep -e "\bgit\b" and echo "Digital" | grep -e "\bgit\b"
  • Just in case.. I also tried using \< and \> like highlight-article "title =~ \"\\<git\\>\"" white blue bold but it didn't work either.
  • I've found about scripts so I'll try my luck meanwhile writing one 馃憤

I ran Newsboat (current master, 2e696502ba5d6109b6257ac33c24cd777eaab7f4) with debug logging (--log-file=newsboat.log --log-level=6) and found that \\bgit\\b is understood as \bgitb (instead of \bgit\b). To make it work properly, I had to double-escape the second backslash:

highlight-article "title =~ \"\\bgit\\\\b\"" white blue bold

This reminds me of https://github.com/newsboat/newsboat/issues/536 ; I wonder if those issues are connected.

So you're doing everything right, @bertocq, it's just Newsboat being a bit broken =\ Re-tagging this as a bug. Thank you for the report!

Was this page helpful?
0 / 5 - 0 ratings