Altair: Top X items of y axis

Created on 23 Jan 2019 · 10Comments · Source: altair-viz/altair

Thank you for an amazing product first of!

I have a question in-regards to getting the Top 10 of Y-axis. Have not found a good example of how to deal with this.

My Question is:
Is there preferred way to sort an aggregated y-axis and display the Top X items by y?

Below is two examples:

Original data, that works to take 10 items
Transformed data in pandas but with same types just different values, but does not sort the items
My current way of making pandas handle the sorting of top 10 items for me.

codes = ['38-10-01', '32-49-01', '38-30-01', '25-20-01', '25-25-02']
percent = [0.006995515695067265,
 0.002466367713004484,
 0.0016591928251121076,
 0.0016143497757847534,
 0.001569506726457399]

df = pd.DataFrame({'codes':codes, 'percentage': percent}, columns=['codes', 'percentage'])

alt.Chart(
    df,
).mark_bar().encode(
    x=alt.X('codes:N', sort=alt.EncodingSortField(field="codes", op="count", order='ascending')),
    y=alt.Y('percentage'),
    tooltip='percentage'

).transform_window(
    rank='rank(percentage)',
    sort=[alt.SortField('percentage', order='descending')]
).transform_filter(
    (alt.datum.rank < 10)
)

This above example works, but after I make some transformation to the data.

This below is a example output. But I can't make it work to get the Top 10 items of something.

new_codes = ['05-41-04', '12-13-03', '12-15-01', '20-00-00', '21-27-02']
new_percentages = [0.5, 0.25, 0.0, 0.0, 0.0]

df2 = pd.DataFrame({'codes': new_codes, 'percentage': new_percentages}, columns=['codes', 'percentage'])

alt.Chart(
    df2,
).mark_bar().encode(
    x=alt.X('codes:N', sort=alt.EncodingSortField(field="codes", op="count", order='ascending')),
    y=alt.Y('percentage'),
    tooltip='percentage'

).transform_window(
    rank='rank(percentage)',
    sort=[alt.SortField('percentage', order='descending')]
).transform_filter(
    (alt.datum.rank < 10)
)

My solution currently

nr = 15
top = data.sort_values(by='percentage', ascending=False).head(nr)


alt.Chart(
    top,
    title='Top {} Probablity of getting Mel Code of a Flight'.format(nr)
).mark_bar().encode(
    x=alt.X('mel_code_3_lvl:N', sort=alt.EncodingSortField(field="percentage", op="sum", order='descending')),
    y=alt.Y('percentage'),
)

My Question again:
Is there preferred way to sort an aggregated y-axis and display the Top X items by y?

Thank you again!

Source

eleijonmarck

Most helpful comment

Hi there,

If you don't need access to the non-aggregated data and you only want the top ten records, then I like your third approach; letting Pandas handle the aggregation and selection.

On a related note, sometimes when you want to cross filter charts with Altair's interactions, you need access to the full set of data from within the spec. This is because the aggregations and top records may be different depending on selections made by clicking/dragging etc. Aside from that, I like to use Pandas when possible and pass only what I need to Altair.

Take care,
Al

Alcampopiano on 23 Jan 2019

👍2

All 10 comments

Hi there,

If you don't need access to the non-aggregated data and you only want the top ten records, then I like your third approach; letting Pandas handle the aggregation and selection.

Take care,
Al

Alcampopiano on 23 Jan 2019

👍2

@Alcampopiano Thank you!

Ah that makes sense.

If you want selection or other operations on the underlying non-aggregated data.
Then you should instead try to aggregate based on the raw data
If the Top aggregated values are only of interest then let Pandas handle the sorting for you.

@Alcampopiano are you part of the team? I have tried to search for a solution which involves 1.
where you aggregate y-axis but then sorting of the Top 10 of the aggregated values only.

eleijonmarck on 24 Jan 2019

👍1

@eleijonmarck No, I'm just an enthusiastic user.

The best I could dig up was a Vega-lite spec that uses window transforms and filters to sort and display N categories. Please see here for docs and example.

I wasn't able to quickly translate that into Altair but someone hopefully someone here can help further.

Take care,
Al

Alcampopiano on 24 Jan 2019

The window transform documentation & examples in Altair are unfortunately quite sparse right now. If anyone would like to add more that would be a welcome contribution.

jakevdp on 24 Jan 2019

Added an example. I do however have a question that I would like to answer as I am unclear what the difference is between

sorting x-axis with mean/average
sorting x-axis with count or any other measure

Also is this necessary? Since we are filtering on the rank of the y-values.

eleijonmarck on 24 Jan 2019

In this case, the aggregation doesn't really do anything because there's only one item per group. Unfortunately the Vega-Lite schema requires an aggregation for all sort operations, so you can't leave it out here. There's been discussion about changing this in Vega-Lite itself, but it's not clear what the default aggregation should be.

jakevdp on 24 Jan 2019

Thanks for adding the example... would you be willing to add that to the documentation website, similar to https://github.com/altair-viz/altair/blob/master/altair/examples/top_k_letters.py ?

jakevdp on 25 Jan 2019

@jakevdp made the top k items in #1312

eleijonmarck on 27 Jan 2019

Merged in https://github.com/altair-viz/altair/pull/1312

eleijonmarck on 19 Feb 2019

@eleijonmarck Hi there, from my understanding, is there no way to achieve this?

If you want selection or other operations on the underlying non-aggregated data.

Filter Top K base on selection is much more useful

noklam on 14 Mar 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Integration with geopandas geometries

mattijn · 23Comments

Incorporate Vega-Tooltip

jakevdp · 48Comments

ENH: add functions for aggregations

jakevdp · 23Comments

Cumulative histogram

hugo-pires · 19Comments

API: should ``Layer()`` not derive from ``BaseObject``?

jakevdp · 34Comments