The lyrics plugin cannot get the lyric text on some pages of genius.com.
Example:
https://genius.com/Ed-sheeran-nothing-on-you-lyrics
The plugin expects this:
<div class="lyrics">
When running
$ beet -vv lyrics
the above page will result in:
Traceback (most recent call last):
[...]
File "/usr/lib/python3.7/site-packages/beetsplug/lyrics.py", line 375, in lyrics_from_song_api_path
lyrics = html.find("div", class_="lyrics").get_text()
AttributeError: 'NoneType' object has no attribute 'get_text'
Reason is that genious.com seems to have several pages where instead of a single div the
text is stored in a list of divs where the class name starts with Lyrics_Container:
<div class="Lyrics__Container-sc-1ynbvzw-2 jgQsqn">...</div>
<div class="Lyrics__Container-sc-1ynbvzw-2 jgQsqn">...</div>
<div class="Lyrics__Container-sc-1ynbvzw-2 jgQsqn">...</div>
To reproduce, enable the lyrics plugin, set the genius as only importer and try to
scrape a fairly large music collection.
lyrics:
bing_lang_from: []
auto: yes
sources: genius
bing_client_secret: REDACTED
bing_lang_to:
google_API_key: REDACTED
google_engine_ID: REDACTED
genius_api_key: REDACTED
fallback:
force: no
local: no
directory: REDACTED/music
ignore_hidden: yes
asciify_paths: yes
import:
move: no
write: yes
incremental: no
resume: no
plugins: lyrics
Here is a patch that worked for me:
--- lyrics.py.orig 2020-08-12 20:10:01.000000000 +0200
+++ lyrics.py 2020-08-12 20:10:01.000000000 +0200
@@ -370,11 +370,21 @@
# Remove script tags that they put in the middle of the lyrics.
[h.extract() for h in html('script')]
- # At least Genius is nice and has a tag called 'lyrics'!
- # Updated css where the lyrics are based in HTML.
- lyrics = html.find("div", class_="lyrics").get_text()
-
- return lyrics
+ # Genius has the lyrics either in multiple divs with class attributes
+ # beginning with "Lyrics__Container", or in a single div with class
+ # attribute "lyrics"
+ lyric_tag = html.find("div", class_="lyrics")
+ if lyric_tag is None:
+ class_matcher = re.compile("^Lyrics__Container")
+ lyric_tags = html.find_all("div", class_=class_matcher)
+ if not lyric_tags:
+ self._log.debug(u'Genius page {0} has no lyric tags', page_url)
+ return None
+ lyrics = u'\n\n'.join(tag.get_text() for tag in lyric_tags)
+ else:
+ lyrics = lyric_tag.get_text()
+ # remove leading and trailing whitespace
+ return lyrics.strip()
def fetch(self, artist, title):
search_url = self.base_url + "/search"
Seems great! Would you mind transforming this patch into a pull request?
Hey, could I pick this issue up, and leverage the patch @wummel provided in order to have a sure shot solution?
That would be awesome!
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I think it still not solved.
I believe that's true. If anybody has the bandwidth to take the above patch and open a quick PR with it, we can get the process of fixing things started!