Plots.jl: Boxplot using only y values

Created on 28 Apr 2016  Â·  27Comments  Â·  Source: JuliaPlots/Plots.jl

At the moment, you could indicate only Y values for boxplot, but the default width looks strange.
Also would be great to support a list of columns names when a wide dataframe is used. I found difficult to plot series of data (Y) since they use the same X value (1):

image

image

Thanks!

Most helpful comment

I think I got it. I'll push the fix soon.

tmp

All 27 comments

There are a couple issues here:

  • Should we set xlims explicitly? (I lean towards no...)
  • How to choose the x values for a boxplot?
  • Better handling of arrays of symbols (I agree this is broken right now)

In thinking about this just now, I had the thought that the current method of hoping that a Vector{Any} is good enough to allow dispatch on "processed" data is flawed and ripe for subtle bugs... I should replace the internal logic with a wrapper type:

immutable InputData{T}
  data::T
end

so that it's explicit that an input has been processed and wrapped, and dispatch will never get confused. I'll create a separate issue for this, and the arrays of symbols issue should be resolved as part of that change.

Right now I implement the boxplot recipe by explicitly applying the grouping and forcing the xticks to 1:length(shapes)... this will need to be made more flexible to allow overlaying multiple boxplots.

As a stop-gap solution, you could build the arrays as expected by the current recipe:

tmp

Boxplot looks broken right now:
image

The weird boxplot drawing issue is fixed.

I think the solution for the x-axis will be to have some sort of DiscreteAxis type that can map strings, etc to an x/y coordinate. I want to be able to overlay a scatter or violin plot over a boxplot but still allow new series to extend the axis. This can share implementation with the 'setStringVector..." stuff.

In the current master, boxplots are working fine with a categorical variable in x, but it can be used with group.

It can't be used with group.

Yeah I see the bug.. investigating

I think I got it. I'll push the fix soon.

tmp

Awesome :D I found a little bug with the whisker length. I will fix it soon.

@tbreloff The group bug was solved for a call like that, where x and group are the same, but it still gives a strange output in the following example:

ToothGrowth = dataset("datasets","ToothGrowth")
boxplot(ToothGrowth, :Dose, :Len, group=:Supp, notch=true) 

image

I disagree that this is strange. At least... it's what I expect/want. The
group arg creates 2 series. Each of those series are boxplots, and each of
those series are then re-grouped over the same x-domain. (Unless I'm
missing something?)

If you want them in different subplots because you don't like the overlap,
you can add 'layout=2' (you'd probably want to 'link=:all' as well), or
maybe make them easier to see by setting 'alpha=0.5'?

On Tuesday, June 7, 2016, Diego Javier Zea [email protected] wrote:

@tbreloff https://github.com/tbreloff The group bug was solved for a
call like that, where x and group are the same, but gives a strange output
in the following example

ToothGrowth = dataset("datasets","ToothGrowth")boxplot(ToothGrowth, :Dose, :Len, group=:Supp, notch=true)

[image: image]
https://cloud.githubusercontent.com/assets/2822757/15881318/6b11a2f2-2d0b-11e6-9e27-9ffaeff548ac.png

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tbreloff/Plots.jl/issues/210#issuecomment-224474458,
or mute the thread
https://github.com/notifications/unsubscribe/AA492nvKb_xARbcw02_dWpPWi8hggrFvks5qJi9mgaJpZM4ISAu6
.

I didn't know that layout=2, link=:all makes the trick (maybe layout=:Supp could be more intuitive and/or similar to ggplot facet grid). The first time I was expecting something like the ggplot2 output:
image

maybe layout=:Supp could be more intuitive

I can't for the life of me figure out what :Supp is supposed to be. So I wouldn't vote for that being more intuitive! ;)

But these don't sound like very general ideas. What if the x data isn't nicely spaced? What if there are lots of groups? Just seems like its usefulness would be limited, but what do I know? I don't even know what "Supp" is!

Sorry... I was saying that

boxplot(ToothGrowth, :Dose, :Len, layout=:Supp)

would be more intuitive than

boxplot(ToothGrowth, :Dose, :Len, group=:Supp, layout=2, link=:all)

I imagine also layout taking a DataFrames's Formula like ggplot2's facet_grid.

Ha.. oh it's a field not a setting. I can't decide if that makes me look better or worse :open_mouth:

I'm not sure I fully understand what that would mean (in the general sense). This might only work well with dataframe column labels? Even then there's lots of weirdness?

Ok. I understand... In my opinion, no one wants superimposed boxplots (since you compare them side to side). So, having group given supperimposed boxplots instead of having a result similar to ggplot2 is no intuitive. But maybe that is because I used to make a lot of ggplot2 plots. I believe that the actual behavior of group if good for other series, but maybe not so good for boxplot.
As a general stuff, I used to found facet grid taking a R's formula to indicate variables/ data.frame columns of categorical data very useful. So to me, giving a categorical variable to layout means something like: I want a grid with so many plots as factor levels, and plot every data subset according to that levels. But, maybe I'm the only one who expect something like that XD

I think you're not the only one, but... would you agree that this discussion only really makes sense if your inputs are DataFrames and the Symbols for the columns?

Would it make more sense to have a "facet" recipe (similar to how I did marginal hists) which can handle all this stuff? Then it prepares everything for a "generic" boxplot (or whatever else) series recipe... offsetting x-values as needed, creating the layout, etc.

So you would call facet(iris, :Species, <blah blah>, layout = xxx ~ yyy) or something like that, and the facet recipe would replace layout with a real layout based on the formula.

Your facet idea is a lot better than my degeneration of the layout keyword argument ;) But I don't see what should it be restricted to DataFrames...

x = rand(10)
y = rand(10)
z = [0,0,0,0,1,1,1,1,1,1]
w = [1,0,1,0,1,0,1,0,1,0]
facet(x, y, <bla bla>, layout = z ~ w) # Can something like this work?

That may be a lot trickier to implement, as you'd get the Symbols z/w inside the recipe, with no way to access the variables z/w. I'm sure there's a way, it's just not as straightforward as the DataFrame case.

I imagine that maybe we can use a Facet type, which store the variables z and w and make the needed checks in its construction. So, it can use dispatch:
plot(x, y, Facet(z,w))
Other idea can be use a Julia's Pair instead of a DataFrame's Formula. Formula syntax being supported only for DataFrames seems fine to me.

The way I envision it:

@userplot Facet

@recipe function f(facet::Facet; facet_groups = nothing)
    # inputs are the tuple: facet.args
    # TODO: process args with facet_groups to build a layout and assign series to subplots
end

#usage:
facet(args...; facet_groups = ???)

The Facet user plot looks fine. One thing that R solves using points in its formula (i.e. . ~ var) is to indicate if the categorical variable will generate vertical or horizontal subplots.
What do you think about diverging of the formula syntax and using something like:

facet(args...; x_group=varx, y_group=vary)

@tbreloff Is there a better/elegant way to do this?

image

using RDatasets
iris = dataset("datasets","iris")
using Plots
pyplot(size=(300,300))
iris[:dummy] = 1 # To plot the boxplot 
boxplot(iris, :dummy, [:SepalLength :SepalWidth :PetalLength :PetalWidth], layout=grid(1,4), link=:y)

I was expecting to do something like: boxplot(iris, [:SepalLength :SepalWidth :PetalLength :PetalWidth])

Ugh... I need to recode DataFrames support. I hate how I'm doing it now.

On Wed, Jun 29, 2016 at 3:08 PM, Diego Javier Zea [email protected]
wrote:

@tbreloff https://github.com/tbreloff Is there a better/elegant way to
do this?

[image: image]
https://cloud.githubusercontent.com/assets/2822757/16465113/6ab634ba-3e3d-11e6-8db0-34a90ae84b85.png

using RDatasets
iris = dataset("datasets","iris")using Plotspyplot(size=(300,300))
iris[:dummy] = 1 # To plot the boxplot boxplot(iris, :dummy, [:SepalLength :SepalWidth :PetalLength :PetalWidth], layout=grid(1,4), link=:y)

I was expecting to do something like: boxplot(iris, [:SepalLength
:SepalWidth :PetalLength :PetalWidth])

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tbreloff/Plots.jl/issues/210#issuecomment-229457179,
or mute the thread
https://github.com/notifications/unsubscribe/AA492qGnlwAzeOJ3LG3eYGS3e3FpfN8Nks5qQsKugaJpZM4ISAu6
.

@tbreloff other thing about my last example... The boxplot linecolor is equal to the fillcolor, so the median line isn't visible.

You finally motivated me to fix the horribly inflexible DataFrames code, now you can do cool stuff:

tmp

These changes aren't pushed up yet.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nebuta picture nebuta  Â·  3Comments

jebej picture jebej  Â·  4Comments

Cody-G picture Cody-G  Â·  4Comments

asinghvi17 picture asinghvi17  Â·  3Comments

Cody-G picture Cody-G  Â·  3Comments