Thank you for an amazing product first of!
I have a question in-regards to getting the Top 10 of Y-axis. Have not found a good example of how to deal with this.
My Question is:
Is there preferred way to sort an aggregated y-axis and display the Top X items by y?
Below is two examples:
codes = ['38-10-01', '32-49-01', '38-30-01', '25-20-01', '25-25-02']
percent = [0.006995515695067265,
0.002466367713004484,
0.0016591928251121076,
0.0016143497757847534,
0.001569506726457399]
df = pd.DataFrame({'codes':codes, 'percentage': percent}, columns=['codes', 'percentage'])
alt.Chart(
df,
).mark_bar().encode(
x=alt.X('codes:N', sort=alt.EncodingSortField(field="codes", op="count", order='ascending')),
y=alt.Y('percentage'),
tooltip='percentage'
).transform_window(
rank='rank(percentage)',
sort=[alt.SortField('percentage', order='descending')]
).transform_filter(
(alt.datum.rank < 10)
)
This above example works, but after I make some transformation to the data.
This below is a example output. But I can't make it work to get the Top 10 items of something.
new_codes = ['05-41-04', '12-13-03', '12-15-01', '20-00-00', '21-27-02']
new_percentages = [0.5, 0.25, 0.0, 0.0, 0.0]
df2 = pd.DataFrame({'codes': new_codes, 'percentage': new_percentages}, columns=['codes', 'percentage'])
alt.Chart(
df2,
).mark_bar().encode(
x=alt.X('codes:N', sort=alt.EncodingSortField(field="codes", op="count", order='ascending')),
y=alt.Y('percentage'),
tooltip='percentage'
).transform_window(
rank='rank(percentage)',
sort=[alt.SortField('percentage', order='descending')]
).transform_filter(
(alt.datum.rank < 10)
)
My solution currently
nr = 15
top = data.sort_values(by='percentage', ascending=False).head(nr)
alt.Chart(
top,
title='Top {} Probablity of getting Mel Code of a Flight'.format(nr)
).mark_bar().encode(
x=alt.X('mel_code_3_lvl:N', sort=alt.EncodingSortField(field="percentage", op="sum", order='descending')),
y=alt.Y('percentage'),
)
My Question again:
Is there preferred way to sort an aggregated y-axis and display the Top X items by y?
Thank you again!
Hi there,
If you don't need access to the non-aggregated data and you only want the top ten records, then I like your third approach; letting Pandas handle the aggregation and selection.
On a related note, sometimes when you want to cross filter charts with Altair's interactions, you need access to the full set of data from within the spec. This is because the aggregations and top records may be different depending on selections made by clicking/dragging etc. Aside from that, I like to use Pandas when possible and pass only what I need to Altair.
Take care,
Al
@Alcampopiano Thank you!
Ah that makes sense.
Then you should instead try to aggregate based on the raw data
If the Top aggregated values are only of interest then let Pandas handle the sorting for you.
@Alcampopiano are you part of the team? I have tried to search for a solution which involves 1.
where you aggregate y-axis but then sorting of the Top 10 of the aggregated values only.
@eleijonmarck No, I'm just an enthusiastic user.
The best I could dig up was a Vega-lite spec that uses window transforms and filters to sort and display N categories. Please see here for docs and example.
I wasn't able to quickly translate that into Altair but someone hopefully someone here can help further.
Take care,
Al
The window transform documentation & examples in Altair are unfortunately quite sparse right now. If anyone would like to add more that would be a welcome contribution.
Added an example. I do however have a question that I would like to answer as I am unclear what the difference is between
Also is this necessary? Since we are filtering on the rank of the y-values.
In this case, the aggregation doesn't really do anything because there's only one item per group. Unfortunately the Vega-Lite schema requires an aggregation for all sort operations, so you can't leave it out here. There's been discussion about changing this in Vega-Lite itself, but it's not clear what the default aggregation should be.
Thanks for adding the example... would you be willing to add that to the documentation website, similar to https://github.com/altair-viz/altair/blob/master/altair/examples/top_k_letters.py ?
@jakevdp made the top k items in #1312
@eleijonmarck Hi there, from my understanding, is there no way to achieve this?
If you want selection or other operations on the underlying non-aggregated data.
Filter Top K base on selection is much more useful
Most helpful comment
Hi there,
If you don't need access to the non-aggregated data and you only want the top ten records, then I like your third approach; letting Pandas handle the aggregation and selection.
On a related note, sometimes when you want to cross filter charts with Altair's interactions, you need access to the full set of data from within the spec. This is because the aggregations and top records may be different depending on selections made by clicking/dragging etc. Aside from that, I like to use Pandas when possible and pass only what I need to Altair.
Take care,
Al