Beets: Extra fields from MusicBrainz (missing fields)

Created on 22 Jul 2015  路  33Comments  路  Source: beetbox/beets

Yanno, for all that beets prides itself in being the tagger for OCD music lovers, i'm a little surprised at the info it doesn't pull from musicbrainz.

It doesn't pull performer, composer, lyricist, conductor, engineer or producer tags. In short, why not just pull ALL the tags that it can from musicbrainz (see here https://picard.musicbrainz.org/docs/mappings/ ) and apply it to the file and database for searching. Why wouldn't i want to find all the various tracks that Jon Coltrane plays on even when the album title or group doesn't bare his name?

I'm just learning Python now, so should i take a shot at this? Should it be a plugin, or should i not bother because it's on the roadmap?

feature

Most helpful comment

Is anyone willing to do this? Else I can give it a try

All 33 comments

I'm for it! Specifically, this won't be too onerous if we take this tactic:

  • Refactor the AlbumInfo and TrackInfo classes to just be "bags of tags" rather than having specific fields that need to be filled (and explicitly added to the code every time we think of a new field).
  • Refactor the autotag machinery for applying metadata to just copy data from the Info object to the database object blindly (again to avoid needing to add per-field plumbing). This would be agnostic to whether the field is fixed or flexible.
  • Now the mb.py backend is free to iterate as much as possible and add all the metadata it likes without disturbing the rest of the codebase.

See also #506.

+1

+1

@sampsyo,
How exactly would you like me to proceed ?
I believe reading in the task description that we need to change the entire way of storing the metadata received from MusicBrainz.

Hey again, @anshuman73! You're right; this one is a bit more involved than last time.

Let's start this way. Try prototyping replacements for the AlbumInfo and TrackInfo classes in beets/autotag/hooks.py. At first, these classes should work the same way as before, but they should get rid of the prescribed list of fields. In fact, it should work to set any field on it, much like a Python dictionary. This should work:

>>> info = TrackInfo(some_field=12)
>>> info.some_other_field = 50

Then, once that's working, we'll want to revisit the apply_metadata function in autotag/__init__.py to take advantage of this new structure. We should be able to simplify it down to just copy over all the fields from AlbumInfo to Album, and the same for TrackInfo to Item, with no complicated logic.

Sound good? Let me know if you have any questions.

@sampsyo
So we want the classes to have all the class variables they already have and be able to add more class variable dynamically as and when required ?
Also I should remove the _compulsory_ fields ( for eg, album, album_id, artist, etc) and let the an instance of the Class be declared with any 1 of the fields, right ? (sorry if I'm repeating what you said, just wanted to confirm I've understood it right)
and then move onto autotag/__init__.py and change the structure

Yep, you totally got it!

As you said, the first step is to get rid of all the built-in fields and just let any field work on those classes. The easiest way might be to make those classes inherit from dict and then define the magic methods __getattr__ and __setattr__, rather like this SO answer: http://stackoverflow.com/a/32107024

Ahh this SO link is _truly_ magic !
you want the classes to be able to add new fields in the typical dictionary way too ? For Eg,
(Trackinfo_instance['new _field'] = 'a')?

Yes, exactly. And for backwards compatibility, info.field should work the same as info['field'].

Is anyone willing to do this? Else I can give it a try

Ok first question: I have a fork of beets but it already has a pending pull request, so if I make a new branch it will have the commits of this pull request (https://github.com/beetbox/beets/pull/2563). How can I get a new fork or a clean branch that only contains the 'real' beets?

If I just take my fork and make a new branch, the branch will start as a copy of my master branch, which is not exactly beets (20 more commits).

@dosoe If I'm understanding correctly, you want to update your fork to have the latest from the master repository. This document should help: https://help.github.com/articles/syncing-a-fork/

You should probably create a second branch in order to do this work, so once you have the upstream remote configured, you could probably do something like:

git checkout upstream/master
git checkout -b mb-extra-fields

Then you can push that mb-extra-fields branch up to origin and create a separate pull request w/ that.

Ok I finally got it going, thanks.

Now how is this supposed to work? From the upper description I get that we want to start it by changing the __init__ from the TrackInfo and AlbumInfo classes of beets/autotag/hooks.py:
Right now it is

def __init__(self, title, (many tags)):
    self.title = title
    self.track_id = track_id
    .
    .
    .

What do we want to do? Do we keep it like this and add a add_tag function on each class?
Do we transform it into something like:

def add_tag(self, tag_name)
    (ads an empty tag)
def __init__(self, st=some_tag):
    self.add_tag(self, st)
    self.st = some_tag
    .
    .
    .

?
I am not a programmer, so expect to get stupid questions...

What holds us back from just making a copy-pasting this code: http://stackoverflow.com/a/32107024 to replace TrackInfo and AlbumInfo? There is this decode function (that ensures everything is utf-8) to rewrite but else it does what we want it to do, right? Then we have to play around with apply_item_metadata in beets/autotag/__init__.py to just loop through all the keys of a TrackInfo object and same for apply_metadata with AlbumInfo? And in beet/autotag/mb.py the only thing to change is when we call beets.autotag.hooks.TrackInfo at the start of track_info and the corresponding place for album_info. Did I get it right or am I making it more confusing?

and in beet/autotag/__init__.py we have to replace all the if track_info.medium_index is not None: by if track_info.get(medium_index): .
Is there anything else?

Here is a first try (that doesn't work) as a work base: https://github.com/beetbox/beets/pull/2650

I would suggest to continue the discussion there.

Hi! On https://github.com/beetbox/beets/pull/2650 I started working on it, but there is a recurrent problem: I am trying to transform the Object inherited class TrackInfo in a dict related class. However, the problem there is that dict objects are not hashable, which makes it impossible to make a deepcopy of a list of dict and to make somthing like list(set(tracks) - set(mapping.values())) were tracks is a list of TrackInfo objects. Would anyone have an idea how to get around these problems?

Good question. I think the right solution is probably to make the TrackInfo objects hashable to restore the old functionality. The easiest way to do that is probably just to define a __hash__ function that uses the default implementation:

def __hash__(self):
    return object.__hash__(self)

That would preserve identity-based hashing instead of contents-based hashing as described in this relevant-seeming SO answer.

Perhaps the cleaner thing to do, however, would be to avoid making TrackInfo inherit directly from dict. Instead, just keep a dict of values as a field on the TrackInfo object. That way, we'd sidestep the issue and avoid any unpleasantness of sharing the implementation from dict.

I did it because you looked for something like https://stackoverflow.com/questions/2352181/how-to-use-a-dot-to-access-members-of-dictionary/32107024#32107024 , so I just copy-pasted it. That's why we end up with a dict inheritance.

I tried to make TrackInfo inherit from object and failed miserably. Trying the first solution you propose however leads to problems with deepcopy

Hello,
I implemented what you suggested above in #2654, TrackInfo and AlbumInfo inherit from ItemInfo (a new class) which inherits from dict. The hash function uses the hash of the track id from musicbrainz, since it must be unique, it seems to be a good candidate.

I still have to make apply_metadata generic.

Is the mb_trackid a good choice? Does every track have one or do you add tracks by hand or take metadata from let's say discogs without putting a mb_trackid on it?

I have some tracks I can't identify with MB so I'm thinking about setting the metadata by hand, but these tracks won't have a mb_trackid.

Good point. I assumed every track is identified with MB. Shouldn't you add the release on MusicBrainz, and then identify from MB, in that case? I think the software should entrust everything regarding data to MB, and just dump data from it, with as less reprocessing as possible. Else, a fallback id can be used, depending on the source.

It's what I'm doing, but for some of my music I can't trace back which release it is, and I don't expect everyone to do it. I think unsing mb_id with a fallback is a good idea.

  • Refactor the AlbumInfo and TrackInfo classes to just be "bags of tags" rather than having specific fields that need to be filled (and explicitly added to the code every time we think of a new field).

  • Refactor the autotag machinery for applying metadata to just copy data from the Info object to the database object blindly (again to avoid needing to add per-field plumbing). This would be agnostic to whether the field is fixed or flexible.

The first part has been dealt with, do you have an idea on how you want the next point to be like? I'm afraid it's not gonna be straightforward, we will need to add plumbing (for example the 'recording engineer' is called 'recording' in the MB API), but at least for instruments and engineers it should be rather straightforward to implement.

That second bullet, fortunately, is just about refactoring the apply_metadata and apply_item_metadata functions, which you have already gotten started in #3587. Thanks!!

I'm trying right now to implement automatic fetching of all artists associated with a recording, but I'm running into a problem. Let's say we retrieve all artists and create appropriate tags (like Soprano vocals) flexibly for each recording. What happens if there is a change? Let's say it turns out the soprano singer was in fact a mezzo-soprano singer and the corresponding change is made on MusicBrainz (like for example this edit: https://musicbrainz.org/edit/69614459 ). We would then end up with two tags, Soprano vocals and Mezzo-soprano vocals with both the same content. How do we remove the Soprano vocals tag now that it's not needed anymore? Else, over time, tracks would become more and more cluttered with obsolete tags. How do we arrange that 'obsolete' tags disappear? Before flexible tags, it was easy, as each tag was modified to None if beets didn't get the corresponding data (even if sometimes there is clutter when album-related tags stay on a track that is reimported as a singleton, but that's more a problem of the file than a problem of the library).
So my question is: how do we clean up obsolete tags, now that they are flexible?
Could there be an option when storing the info in the library to give a list of tags that are up-to-date so that when storing all other tags are removed? I'm afraid that would be complicated to implement, especially as it would need to keep track of the multiple plugins that add additional tags (discogs, keyfinder, parentwork to name just a few). What's your opinion on this?

Yeah, that seems fairly tricky! Fortunately, it is possible to remove flexible attributes鈥攋ust doing del item['field'] or del item.field should do it.

Of course, figuring out the right _policy_ for when to delete fields is the hard part. In the hypothetical plugin that gets a bunch of extra artists for a track, it could be a policy that, every time the plugin runs, it deletes all of those fields that it can populate and re-fills them.

Well, if you want to do it in a plugin, it's easy (actually, did we need all this work for plugins? Don't flexible attributes already work this way?). If you want to have flexible attributes in beets core, then we should find out how, where and when to clean up tags in the core. One idea could be to impose tags added by plugins to be somehow recognizable, to single them out. That would force us to overhaul all plugins to adapt their tag names, of course, and would mean trouble in the way of backwards-compatibility.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nopoz picture nopoz  路  4Comments

ctrueden picture ctrueden  路  3Comments

chayward1 picture chayward1  路  4Comments

robot3498712 picture robot3498712  路  3Comments

clounie picture clounie  路  3Comments