I have a CSV file (see sample file) with a pre-computed histogram. It has 2 columns: bin and val, for each value there is an associated count ("bin" in this case).
I would like to plot this histogram using visidata. This can be done by marking the "val" column as key and ploting the "bin" column with .. However I don't like this visualization since it uses a scatterplot-like visualization, and a bar plot would be more suited in my case.
So my current solution is:
z+, max.'#' * int(bin/the_max * 20) where the_max is a number, given by the previous step.The downside of this solution is that the max has to be manually put in the expr.
Which leads to my question: is there a way in visidata to compute the max (or any expr) of a column and use this value in another expr?
From what I understand about visidata, it is more oriented toward raw data, so my use case (pre-computed data) may not fit with what visidata was designed for.
Example attached:
example.zip
Hi @zakora, this is an interesting request. I think you're probably doing the best that can be done with the interface alone. However, you could add the following to your .visidatarc:
Sheet.addCommand('z.', 'addcol-freq', 'addColumn(HistogramColumn(cursorCol), cursorColIndex+1)')
def HistogramColumn(sourceCol):
return Column(sourceCol.name+"_histogram",
getter=lambda c,r: options.disp_histogram*(options.disp_histolen*c.source.getTypedValue(r)//c.largest),
width=options.disp_histolen+2,
source=sourceCol,
largest=max(sourceCol.getValues(sourceCol.sheet.rows)))
and then z. would add a new column histogramming the current column.
In other words, if you can do it in Python, you can do it in VisiData :)
Thanks for your solution, it is very helpful!
Will there be something like ='#' * int(bin/max(col('bin')) * 20)?
@agguser I don't think I understand...what are you looking to do?
I mean: will visidata support an expression for the list of values in a column (e.g. col('bin') would return the list of values in column "bin")?
@agguser It's tricky to make that work with decent performance on large datasets. For aggregating column data I've made new subsheets in the past. If you tell me what kind of expressions you're trying to compute, we can see if there's a way to do what you want.
I usually need "percentage columns" (=x*100/sum(col('x'))).
Most helpful comment
Hi @zakora, this is an interesting request. I think you're probably doing the best that can be done with the interface alone. However, you could add the following to your
.visidatarc:and then
z.would add a new column histogramming the current column.