Is there any way of displaying a histogram for numeric values with visidata? I typically plot the histograms of numeric attributes when I get a new data set to see what the distribution is and if there are any outliers.
Ideally the histogram would have a parameter defining the width of each bin.
Yes! I've been wanting this too. But I got stuck on the numeric binning code and never got back to it. Maybe it's time to clean that up.
Ideally the histogram would have a parameter defining the width of each bin.
The width of each bin, or the number of bins? Does each bin always have the same width, or would you want to specify ranges (like age 18-34, 35-44, 45-54)?
The width of each bin, or the number of bins?
Hm, ideally I think it would be either one. (1) Setting the number of bins partitions the range according to the maximum width of equally sized bins. (2) Setting the width of the bin creates enough bins to capture the whole range. My use-case typically is (1), setting the number of bins.
Does each bin always have the same width, or would you want to specify ranges (like age 18-34, 35-44, 45-54)?
That is very interesting, especially for not normally distributed data. I work a lot with genetic data and here you typically have a high amount of lowly abundant genes and then the highly expressed genes that stretch the distribution. It would be cool to have rather small bin sizes for the lowly abundant genes and high bin sizes for the highly expressed genes.
But for now equally sized bins would already help me a lot.
Hi @paulklemm!
@saulpw made a first pass at implementing numeric binning (including date ranges!) in this commit https://github.com/saulpw/visidata/commit/e114f609c4ec366d68387b6b0d85d21bb55ff684. Please note that columns have to be typed as numeric (with #, %, $ or @) in order for them to be numerically binned.
On default, the following heuristic will be used to calculate the widths of each equally sized bin. Alternatively, you can set the histogram_bins option either in the OptionsSheet (press O) or in your ~/.visidatarc.
I have found a few hiccups with playing vd scripts with it as is it is, that we will need to fix before the feature can be shipped, but otherwise it should be ready for you to start playing with it from the develop branch. =) If you could, please give it a go and let us know how it feels.
Will check it out asap 馃憤
I tried it and it works like a charm! Thank you so much, this will help me a lot!
How to create an histogram for numeric columns? I do not find it in https://www.visidata.org/v2.x/
Thank you
Hi @aborruso, it is on the website (5th item for new features for v2.-1): \
If you set the option numeric_binning to True, Freqency and Pivot tables with numeric columns will bin the values into ranges.
Ok, @frosencrantz I didn't understand the goal.
I thought that starting from a numeric column there was a command to generate a new histogram column, without it being necessary to use Freqency and Pivot tables, but starting from any table. But is not so.
Thank you
If you set the option
numeric_binningtoTrue, Freqency and Pivot tables with numeric columns will bin the values into ranges.
@frosencrantz is there any example use case? Because I don't seem to be able to use it or understand how it works
I'm trying to apply it in the below CSV input file, but I'm not able to create something similar the output example.
Thank you
Input example
n,count
1,34
2,28
3,15
4,10
5,5
6,3
7,3
8,2
9,2
10,2
11,2
12,2
13,2
14,2
15,2
16,1
17,1
18,1
19,1
Output example
| n | count |
+----+------------------------------------------+
| 1 | *************************************** |
| 2 | ******************************** |
| 3 | **************** |
| 4 | ********** |
| 5 | **** |
| 6 | ** |
| 7 | ** |
| 8 | * |
| 9 | * |
| 10 | * |
| 11 | * |
| 12 | * |
| 13 | * |
| 14 | * |
| 15 | * |
| 16 | |
| 17 | |
| 18 | |
| 19 | |
+----+------------------------------------------+
n is typed as int.develop branch. =)ok @anjakefala it works in another (great) way, it counts the items in bin. I have misinterpreted the subject of this issue.
My goal was to create bars proportional to a specific field. As the histogram column does in frequency table: it's a bars field, whose length are proportional to the percent column.
Instead it counts
| count | histogram |
+---------------+----------------------------------------------------+
| (1.0, 9.25) | ************************************************** |
| (9.25, 17.5) | ****** |
| (17.5, 25.75) | - |
| (25.75, 34.0) | ****** |
+---------------+----------------------------------------------------+
Most helpful comment
I tried it and it works like a charm! Thank you so much, this will help me a lot!