Altair: Cumulative histogram

Created on 28 May 2018  Â·  19Comments  Â·  Source: altair-viz/altair

Hello

Is it possible to have "cumsum" operator that I can use together with bining to produce a cumulative histogram?

Thank you

question

Most helpful comment

@hugo-pires , better late than never. If you drop the bin on your soma chart, it should work. Here's one I did:

cumu = alt.Chart(df_totals).mark_line(color='black', interpolate='step-after').transform_joinaggregate(
    total='count(*)'
).transform_calculate(
    pct='1 / datum.total'
).transform_window(
    frame=[None, 0],
    sort=[{"field": "return"}],
    cumu='sum(pct)'
).encode(
    alt.X("return:Q")),
    alt.Y('cumu:Q', axis=alt.Axis(title='Cumulative Likelihood'))
).properties(
    title='Distribution of Inflation Adjusted S&P 500 Returns (CAGR)',
    width=700,
    height=450
)

(hist + cumu).resolve_scale(y='independent')

visualization

All 19 comments

https://altair-viz.github.io/user_guide/transform.html#window-transform supports this.

There are a bit more examples in Vega-Lite docs -- you can adapt cumulative average to make cumulative sum.

It would be good to add an Altair example of a cumulative histogram

Thank you both. But I have the same question: is there any cumsum operator that I can use?

I'd be happy to work on adding one. Is there a simple example out there in Vega or elsewhere you'd recommend working off?

You could take an example from this item on seaborn documentation:
https://seaborn.pydata.org/generated/seaborn.distplot.html

or this question on Stackoverflow:
https://stackoverflow.com/questions/39297523/plot-cdf-cumulative-histogram-using-seaborn-python

It seems to me that there is also a need for a very simple cumulative total example in our line charts.

At my day job at the Los Angeles Times, I edited a story a few weeks ago about the flood of "Super PAC" money rushing into California's looming primary for governor.

Here's a chart that ran with that story.

If there's not a good data set in vega_datasets currently for a "cumsum" line chart, this could make a good candidate. It's small, it's simple, it clearly fits the example.

The other issue is that the window transform API is currently very low-level and pretty painful to use (for example, it requires an as attribute which has to be passed as a kwarg dict because as is a keyword in Python) so it might be better to wait on user-facing examples until we have a better story for that.

850 is where we're tracking window transform syntax.

Okay. Let's wait for the syntax to settle.

On Mon, May 28, 2018, 8:04 PM Jake Vanderplas notifications@github.com
wrote:

850 https://github.com/altair-viz/altair/issues/850 is where we're

tracking window transform syntax.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/altair-viz/altair/issues/902#issuecomment-392640070,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAAnCd4dqZdzCGohUIr6hgu_S_5Qlg3Cks5t3Lq5gaJpZM4UP6zM
.

I think this article could be also very interesting for the discussion:
https://www.circonus.com/2018/05/effective-management-of-high-volume-numeric-data-with-histograms/

FYI, I'm adding cumulative frequency plot to Vega-Lite https://github.com/vega/vega-lite/pull/3833

Where can I find the vega-lite example? Can I use it on Altair?

I guess I am almost there:

hist = alt.Chart(clientes).mark_bar().encode(
    alt.X("valor:Q", bin=alt.Bin(maxbins=50)),
    y='sum(valor)'
)

soma = alt.Chart(clientes).mark_line(color='red').transform_window(
    window=[alt.WindowFieldDef(op='sum', field='valor', **{'as': 'TotalValor'})],
    frame=[None, 0],
    sort=[{"field": "valor", "order": "ascending"}]
).encode(
    alt.X("valor:Q", bin=alt.Bin(maxbins=50)),
    y='TotalValor:Q'
)

alt.layer(
    hist,
    soma
).resolve_scale(
    y='independent'
)

Here is the result
visualization

@hugo-pires the example is in the posted PR: https://github.com/vega/vega-lite/pull/3833

I am sorry but I still need some help to "smooth" the red cumulative line. I also had some questions to "translate" @kanitw example to Altair. Could you help me?

Thank you

@hugo-pires , better late than never. If you drop the bin on your soma chart, it should work. Here's one I did:

cumu = alt.Chart(df_totals).mark_line(color='black', interpolate='step-after').transform_joinaggregate(
    total='count(*)'
).transform_calculate(
    pct='1 / datum.total'
).transform_window(
    frame=[None, 0],
    sort=[{"field": "return"}],
    cumu='sum(pct)'
).encode(
    alt.X("return:Q")),
    alt.Y('cumu:Q', axis=alt.Axis(title='Cumulative Likelihood'))
).properties(
    title='Distribution of Inflation Adjusted S&P 500 Returns (CAGR)',
    width=700,
    height=450
)

(hist + cumu).resolve_scale(y='independent')

visualization

Thank you @kdunn926

Was this page helpful?
0 / 5 - 0 ratings

Related issues

maxgerma picture maxgerma  Â·  3Comments

SuperShinyEyes picture SuperShinyEyes  Â·  3Comments

pabloinsente picture pabloinsente  Â·  3Comments

breadbaron picture breadbaron  Â·  4Comments

floringogianu picture floringogianu  Â·  3Comments