Beets: Summarize command

Created on 5 Mar 2019  Â·  10Comments  Â·  Source: beetbox/beets

I'm not absolutely sure this _doesn't_ exist yet, as I've only been using beets for a week or two, but I can't find it anywhere. What I would like to see is some way to summarize library information within categories as specified by the user. For example, something like

beet summarize genre

would spit out a list of genres in my library with associated summary statistics (perhaps the same as those that are displayed by beet stats), one line/block for each genre. One could imagine using different fields to split by, and even multi-level splits, eg. beet summarize 'genre year'.

In addition, one could imagine being able to _sort_ the results based on any of the statistics of the group (eg. number of tracks, average bpm etc.) and limit the resulting output. This way, one could easily find, for example, the top-ten most prolific genres in one's library.

I feel like this could probably just be options to the stats plugin, though I'm not sure if they are within scope.

Please enlighten me if this functionality already exists! I'm happy to dig into the code myself if need be :-)

needinfo

Most helpful comment

Interesting! Personally, I like using Unix shell pipelines for such things, like this:

$ beet ls -f '$genre' | sort | uniq -c | sort -nr

That gives you a list of all the genres in your library, sorted by popularity (and with a count next to each one). If you only want the top 5 genres:

$ beet ls -f '$genre' | sort | uniq -c | sort -nr | head -n5

I recommend trying to play around with stuff like that! After you give that a try, you might have some ideas about what is not well covered by this kind of shell-based fun.

All 10 comments

Interesting! Personally, I like using Unix shell pipelines for such things, like this:

$ beet ls -f '$genre' | sort | uniq -c | sort -nr

That gives you a list of all the genres in your library, sorted by popularity (and with a count next to each one). If you only want the top 5 genres:

$ beet ls -f '$genre' | sort | uniq -c | sort -nr | head -n5

I recommend trying to play around with stuff like that! After you give that a try, you might have some ideas about what is not well covered by this kind of shell-based fun.

That's really cool, I hadn't thought of that!

Here's a list of things I've thought about so far that might make my original proposal worthwhile:

  1. I _hadn't_ thought of using the Unix shell pipelines. I can see how these are really useful, and I don't want to re-invent the wheel, but if I hadn't thought of using them, probably other people aren't either, whereas a documented "summarize" command would be easier to pick up.
  2. I guess those kinds of shell pipelines don't work on Windows et al.?
  3. As far as I can tell, one can really only sort by aggregated number of tracks using the shell pipelines. My proposal would give the ability to aggregate/sort by average/total length of tracks, average bpm, average year, etc. etc.

I've started playing around with the plugin framework and have a very basic working plugin already. It definitely needs polishing before I share it, but as a couple of examples:

$ beet summarize -c genre -s ":length" | head -n5 
Folk:  total_tracks: 248  average_length: 3.86 min.  
Rock:  total_tracks: 240  average_length: 3.95 min.  
Klassik:  total_tracks: 15  average_length: 4.60 min.  
Speech:  total_tracks: 225  average_length: 46.01 min.  
Punk:  total_tracks: 12  average_length: 3.82 min. 

(The ":" in front of length indicates that it wants average length). Or the following:

$ beet summarize -c year -s "length :bitrate" | head -n5
1998:  total_tracks: 109  total_length: 551.40 min.  average_bitrate: 795911.23  
2010:  total_tracks: 203  total_length: 1436.36 min.  average_bitrate: 747686.56  
1995:  total_tracks: 151  total_length: 589.04 min.  average_bitrate: 730776.97  
2008:  total_tracks: 168  total_length: 1994.95 min.  average_bitrate: 701909.63  
2006:  total_tracks: 317  total_length: 2381.38 min.  average_bitrate: 648899.57 

I haven't implemented sorting yet, which is obviously important. Any numerical field will work for the "-s" option, and I plan to extend that to string fields (perhaps total length of string, unique words in string, etc.).

Also, of course the stats can be filtered by a query:

$ beet summarize -c year -s "length :bitrate" genre:Rock | head -n5
2010:  total_tracks: 26  total_length: 102.93 min.  average_bitrate: 866399.00  
2003:  total_tracks: 50  total_length: 220.74 min.  average_bitrate: 965580.24  
2011:  total_tracks: 28  total_length: 168.55 min.  average_bitrate: 944387.64  
2005:  total_tracks: 37  total_length: 172.21 min.  average_bitrate: 950862.86  
2007:  total_tracks: 15  total_length: 132.20 min.  average_bitrate: 913755.33 

Does this look interesting? Once I get sorting working, and perhaps multi-level categories, it would be good to get help on using best-practices for printing to screen and defining options properly.

Oh, and thanks for an awesome tool! It was amazing how easy it was to get the plugin up and running at a basic level.

Huh, interesting! It sounds like you're trending toward implementing the equivalent of SQL's GROUP BY and aggregates (e.g., MAX, MIN, SUM, AVG, COUNT). Our query syntax already essentially supports WHERE filtering and ORDER BY sorting, so a proposal like this might round us out to cover all of the traditional SQL SELECT statement. :smiley:

Put more seriously, taking inspiration from SQL might be a good way to guide your thinking and focus on orthogonal, reusable pieces.

Got a link to where you're developing the plugin?

I don't have a heap of experience with SQL, so I could definitely use help on that front. However, I think the basic plugin now does a reasonable job. It's at https://github.com/steven-murray/beet-summarize.

Some example queries would be

$ beet summarize
genre                  | count
---------------------- | -----
Kids Music             | 340  
Romantic               | 268  
Folk                   | 248  
Pop                    | 248  

The default grouping field is "genre", and the default statistic is "count". You can reverse the order of the sort:

$ beet summarize -g year -R
year | count
---- | -----
1981 | 1    
1991 | 4    
1985 | 9    
1982 | 10   
1990 | 11   

One can use a QUERY to restrict results:

$ beet summarize -g year -R genre:rock
year | count
---- | -----
2004 | 8    
1984 | 10   
1982 | 10   
1987 | 10   
1999 | 11   
2002 | 11   

And of course, you can specify aggregate statistics to report. Each statistic is a valid field with optional pre-pending modifiers. Modifiers include the aggregation function (options are MIN, MAX, SUM, COUNT, AVG, RANGE), whether to only include UNIQUE entries, and converters for when the field is of str type (options are LEN and WORDS). So:

$ beet summarize -g year -s "count avg|bitrate avg:words|lyrics count:unique|artist"
year | count | avg|bitrate       | avg:words|lyrics   | count:unique|artist
---- | ----- | ----------------- | ------------------ | -------------------
2006 | 317   | 648899.5741324921 | 273.51419558359623 | 41                 
0    | 257   | 354243.0622568093 | 13.801556420233464 | 37                 
2009 | 244   | 709426.0778688524 | 660.7786885245902  | 17                 
2005 | 241   | 754819.5145228215 | 681.6099585062241  | 24                 
2010 | 203   | 747686.5615763547 | 537.1133004926108  | 51    

Sorting happens on the first statistic specified. Multiple statistics are separated by spaces. Modifiers are separated by colons.

Commenting to link related issue: #824

@sampsyo I've added a better README to the repository, and would welcome any feedback over at that repo (https://github.com/steven-murray/beet-summarize). If/when you feel that the repo is strong enough to be considered part of the beets ecosystem, I'd be happy for it to be included on the list of extensions or whatever you feel is the appropriate place.

In the meantime, I guess this issue can be closed?

@steven-murray: If @sampsyo thinks the Summarize plugin is solid/appropriate enough for inclusion alongside the other plugins in the Beets repo, would you be open to that?

@justinmayer yes that would be perfectly fine.

I've added a link to the docs! This seems like a very useful thing. If you're ever interested in pursuing the bundling option, please open a PR and let's talk! :smiley:

Was this page helpful?
0 / 5 - 0 ratings

Related issues

clounie picture clounie  Â·  3Comments

cgtobi picture cgtobi  Â·  3Comments

nopoz picture nopoz  Â·  4Comments

robot3498712 picture robot3498712  Â·  3Comments

ctrueden picture ctrueden  Â·  3Comments