Visidata: Use a python expression on a whole column data instead of a single cell

Created on 17 Oct 2018  路  8Comments  路  Source: saulpw/visidata

I have a CSV file (see sample file) with a pre-computed histogram. It has 2 columns: bin and val, for each value there is an associated count ("bin" in this case).

I would like to plot this histogram using visidata. This can be done by marking the "val" column as key and ploting the "bin" column with .. However I don't like this visualization since it uses a scatterplot-like visualization, and a bar plot would be more suited in my case.

So my current solution is:

  • find the max of the "bin" column using z+, max.
  • create a derived column with the expr '#' * int(bin/the_max * 20) where the_max is a number, given by the previous step.
    This creates a column with a vertical bar plot.

The downside of this solution is that the max has to be manually put in the expr.
Which leads to my question: is there a way in visidata to compute the max (or any expr) of a column and use this value in another expr?

From what I understand about visidata, it is more oriented toward raw data, so my use case (pre-computed data) may not fit with what visidata was designed for.

Example attached:
example.zip

Most helpful comment

Hi @zakora, this is an interesting request. I think you're probably doing the best that can be done with the interface alone. However, you could add the following to your .visidatarc:

Sheet.addCommand('z.', 'addcol-freq', 'addColumn(HistogramColumn(cursorCol), cursorColIndex+1)')

def HistogramColumn(sourceCol):
    return Column(sourceCol.name+"_histogram",
                    getter=lambda c,r: options.disp_histogram*(options.disp_histolen*c.source.getTypedValue(r)//c.largest),
                    width=options.disp_histolen+2,
                    source=sourceCol,
                    largest=max(sourceCol.getValues(sourceCol.sheet.rows)))

and then z. would add a new column histogramming the current column.

All 8 comments

Hi @zakora, this is an interesting request. I think you're probably doing the best that can be done with the interface alone. However, you could add the following to your .visidatarc:

Sheet.addCommand('z.', 'addcol-freq', 'addColumn(HistogramColumn(cursorCol), cursorColIndex+1)')

def HistogramColumn(sourceCol):
    return Column(sourceCol.name+"_histogram",
                    getter=lambda c,r: options.disp_histogram*(options.disp_histolen*c.source.getTypedValue(r)//c.largest),
                    width=options.disp_histolen+2,
                    source=sourceCol,
                    largest=max(sourceCol.getValues(sourceCol.sheet.rows)))

and then z. would add a new column histogramming the current column.

In other words, if you can do it in Python, you can do it in VisiData :)

Thanks for your solution, it is very helpful!

Will there be something like ='#' * int(bin/max(col('bin')) * 20)?

@agguser I don't think I understand...what are you looking to do?

I mean: will visidata support an expression for the list of values in a column (e.g. col('bin') would return the list of values in column "bin")?

@agguser It's tricky to make that work with decent performance on large datasets. For aggregating column data I've made new subsheets in the past. If you tell me what kind of expressions you're trying to compute, we can see if there's a way to do what you want.

I usually need "percentage columns" (=x*100/sum(col('x'))).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

aborruso picture aborruso  路  12Comments

anjakefala picture anjakefala  路  35Comments

khughitt picture khughitt  路  12Comments

khughitt picture khughitt  路  14Comments

aborruso picture aborruso  路  12Comments