Beets: lyrics: Filter out licensing warning from LyricWiki

Created on 16 Mar 2016  Â·  17Comments  Â·  Source: beetbox/beets

For certain songs (for example, [http://lyrics.wikia.com/wiki/R.E.M.:Pop_Song_89)], LyricWiki will return a standard disclaimer that the site is not licensed to display lyrics for the track:
Unfortunately, we are not licensed to display the full lyrics for this song at the moment. Hopefully we will be able to in the future. Until then... how about a random page?

This issue is being open (as requested) for a potential enhancement to filter out this message.

needinfo

Most helpful comment

lame, they should be returning a 404 :(

All 17 comments

lame, they should be returning a 404 :(

Thanks! We should probably just match on that exact text and ignore it.

Is it just me or this already implemented? It looks like it has been part of beets for over a year now (added by @brunal in commit 63041736e3a18a0ca7fe4efd1278796434f24aa4 on Jan 6, 2015).

Though I don't know Python, it sure looks like the logic is there. But, I am seeing the disclaimer show up in the lyrics tag(s) after performing an import. I checked again while trouble-shooting issue #1903 (id3v23 problem), after completely uninstalling beets and installing the latest source (v1.3.18), and the disclaimer was still showing up. I'm hoping that might rule out some sort of environmental problem with my computer (file corruption, etc), but I'd be interested to know if someone else can reproduce the issue.

Weird! I tried reproducing this:

>>> from beetsplug import lyrics
>>> import logging
>>> l = lyrics.LyricsWiki(None, logging.getLogger())
>>> l.fetch('R.E.M.', 'Pop Song 89 Lyrics')
>>>

and the plugin (correctly) returned no lyrics for that song.

Maybe this was left by an earlier version of beets? You could try removing the lyrics with beet modify lyrics= QUERY and then fetching them again with beet lyrics.

Something even weirder here:

>>> from beetsplug import lyrics
>>> import logging
>>> l = lyrics.LyricsWiki(None, logging.getLogger())
>>> l.fetch('R.E.M.', 'Pop Song 89 Lyrics')
>>>
u'<script>(function() {var opts = {artist: "R.E.M.",song: "Pop Song 89",adunit_id: 39382076,div_id: "cf_async_" + Math.floor((Math.random() * 999999999))};if($(\'.ArticlePreview\').length == 0){document.write(\'\');var c=function(){cf.showAsyncAd(opts)};if(window.cf)c();else{cf_async=!0;var r=document.createElement("script"),s=document.getElementsByTagName("script")[0];r.async=!0;r.src="//srv.tonefuse.com/showads/showad.js";r.readyState?r.onreadystatechange=function(){if("loaded"==r.readyState||"complete"==r.readyState)r.onreadystatechange=null,c()}:r.onload=c;s.parentNode.insertBefore(r,s)};}})();</script><i>&#85;&#110;&#102;&#111;&#114;&#116;&#117;&#110;&#97;&#116;&#101;&#108;&#121;&#44;&#32;&#119;&#101;&#32;&#97;&#114;&#101;&#32;&#110;&#111;&#116;&#32;&#108;&#105;&#99;&#101;&#110;&#115;&#101;&#100;&#32;&#116;&#111;&#32;&#100;&#105;&#115;&#112;&#108;&#97;&#121;&#32;&#116;&#104;&#101;&#32;&#102;&#117;&#108;&#108;&#32;&#108;&#121;&#114;&#105;&#99;&#115;&#32;&#102;&#111;&#114;&#32;&#116;&#104;&#105;&#115;&#32;&#115;&#111;&#110;&#103;&#32;&#97;&#116;&#32;&#116;&#104;&#101;&#32;&#109;&#111;&#109;&#101;&#110;&#116;&#46;&#32;&#72;&#111;&#112;&#101;&#102;&#117;&#108;&#108;&#121;&#32;&#119;&#101;&#32;&#119;&#105;&#108;&#108;&#32;&#98;&#101;&#32;&#97;&#98;&#108;&#101;&#32;&#116;&#111;&#32;&#105;&#110;&#32;&#116;&#104;&#101;&#32;&#102;&#117;&#116;&#117;&#114;&#101;&#46;&#32;&#85;&#110;&#116;&#105;&#108;&#32;&#116;&#104;&#101;&#110;&#46;&#46;&#46;&#32;&#104;&#111;&#119;&#32;&#97;&#98;&#111;&#117;&#116;&#32;&#97;&#32;<a href="/wiki/Special:Random" title="Special:Random">&#114;&#97;&#110;&#100;&#111;&#109;&#32;&#112;&#97;&#103;&#101;</a>&#63;</i><p>&#91;<span style="font-size:80%; line-height:100%; color:black;"><a href="/wiki/LyricWiki:Job_Exchange" title="LyricWiki:Job Exchange">&#73;&#32;&#119;&#97;&#110;&#116;&#32;&#116;&#111;&#32;&#101;&#100;&#105;&#116;&#32;&#109;&#101;&#116;&#97;&#100;&#97;&#116;&#97;</a></span>&#93;&#10;</p>&#10;<!-- \n<p>NewPP limit report\nPreprocessor node count: 406/300000\nPost\u2010expand include size: 3711/2097152 bytes\nTemplate argument size: 631/2097152 bytes\nExpensive parser function count: 1/100\nExtLoops count: 2/100\n</p>\n-->&#10;<script>(function() {var opts = {artist: "R.E.M.",song: "Pop Song 89",adunit_id: 39382077,div_id: "cf_async_" + Math.floor((Math.random() * 999999999))};if($(\'.ArticlePreview\').length == 0){document.write(\'\');var c=function(){cf.showAsyncAd(opts)};if(window.cf)c();else{cf_async=!0;var r=document.createElement("script"),s=document.getElementsByTagName("script")[0];r.async=!0;r.src="//srv.tonefuse.com/showads/showad.js";r.readyState?r.onreadystatechange=function(){if("loaded"==r.readyState||"complete"==r.readyState)r.onreadystatechange=null,c()}:r.onload=c;s.parentNode.insertBefore(r,s)};}})();</script>\n'

I'm not sure if the LyricsWiki plugin is actually working, as this looks like gibberish to me!

Oh man, that's a terrible result. Such is the danger of screen-scraping, I guess…

I don't know why you get different results there from what I get here. Dropping "Lyrics" (i.e., l.fetch('R.E.M.', 'Pop Song 89')) gets my the HTML gibberish, though. Looks like we have even more filtering to do.

Yep, it seems that way. Trying it with some other songs yields the same result, so maybe they have changed the website design?

Edit: So after doing some research, it seems that they are actually encoding their lyrics as HTML entities like so:

&#87;&#97;&#105;&#116;&#105;&#110;&#103;&#32;&#105;&#110;&#32;&#116;&#104;&#101;&#32;&#99;&#97;&#114;

Which, when decoded is:

Waiting in the car

I've modified the LyricsWiki class to look like this (added the unescape call and scrape_lyrics_from_html call), which seems to have fixed it:

class LyricsWiki(SymbolsReplaced):
    """Fetch lyrics from LyricsWiki."""
    URL_PATTERN = 'http://lyrics.wikia.com/%s:%s'

    def fetch(self, artist, title):
        url = self.build_url(artist, title)
        html = self.fetch_url(url)
        if not html:
            return
        lyrics = extract_text_in(unescape(html), u"<div class='lyricbox'>")
        lyrics = scrape_lyrics_from_html(lyrics)
        if lyrics and 'Unfortunately, we are not licensed' not in lyrics:
            return lyrics

Should I create a PR with these changes?

Thanks for investigating! Yes, please do open a PR; that always makes it easier to review the delta.

Just as an FYI: during the testing mentioned above (issue #1903), I did remove lyrics for the song (using puddletag, if I remember correctly), and the LyricsWiki disclaimer would appear after the import if I had the lyrics plugin enabled for that particular test. I also tried storing the word "Testing" for the lyrics before doing an import, and found that it was (correctly) left intact after the import. The disclaimer only appeared when starting with an empty lyrics tag.

If there's anything I can do to help test the change above, I'd be glad to.

That's very peculiar. I'm not entirely sure how the LyricsWiki plugin is even working for you, as my fix hasn't been merged in yet. Could you possibly run the following and post the output here?

beet -vv lyrics -f Pop Song 89

(assuming Pop Song 89 is the song with the issue).

Would it be possible for you to provide your config.yaml too?

I think that's actually the expected behavior: we fetch lyrics only when the song doesn't already have any lyrics, unless you use -f.

Is there any chance the recent changes for unescaping HTML entities somehow fixed this?

Could anyone confirm that this was fixed by #1912?

I just ran "pip install -U beets" to install the latest version (not sure, though, if that would pull in the above #1912 fix), and running an import still resulted in the disclaimer text ("Unfortunately, we are not licensed...") being pulled in for the lyrics.

@dannn-o Since the fix is not released yet, you'll need to install from source (see the FAQ).

I can no longer replicate this so I'm going to go ahead and close it :+1:

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mikeacameron picture mikeacameron  Â·  4Comments

hashhar picture hashhar  Â·  3Comments

myfreeweb picture myfreeweb  Â·  4Comments

foways picture foways  Â·  5Comments

clounie picture clounie  Â·  3Comments