Seaborn: [New feature request] ci = 'se' for categorical plots

Created on 11 May 2018  路  5Comments  路  Source: mwaskom/seaborn

This is a new feature request, which should be very straightforward to implement.

Now that the ci parameter supports showing sd, it would be also handy to include the option to show se, i.e., standard error as well. This would be just a division of the standard deviation by $sqrt(n)$.

I would be happy to implement this and submit a PR if people are happy with this addition.

Most helpful comment

Thanks for the link to wikipedia. I'd suggest you read down to the "assumptions and usage" section.

If you have normally distributed data, then the 68% confidence interval will correspond to the standard error of the mean. In the case of bootstrapping, this will only be approximate and dependent on the number of bootstrap samples.

If you do not have normally distributed data, then the standard error of the mean will be useless, and the bootstrap confidence interval will give you much more informative error bars.

All 5 comments

You can show standard error with a 68% confidence interval.

I am afraid this is not accurate.

Standard error of some quantity is the population standard deviation divided by sqrt(n), without making any assumptions about the underlying distribution.

Another way of seeing why this is not right is that the standard error (of the mean, median or anything else), much like the standard deviation, should be fixed and not dependent on bootstrapping which we would have to rely on to compute 68% intervals.

The standard error is just a metric showing how far is the sample estimate of a quantity (e.g. sample mean) from the population estimate of the same quantity (e.g. population mean).

Thanks for the link to wikipedia. I'd suggest you read down to the "assumptions and usage" section.

If you have normally distributed data, then the 68% confidence interval will correspond to the standard error of the mean. In the case of bootstrapping, this will only be approximate and dependent on the number of bootstrap samples.

If you do not have normally distributed data, then the standard error of the mean will be useless, and the bootstrap confidence interval will give you much more informative error bars.

Here, convince yourself:

f, ax = plt.subplots(figsize=(12, 3))
n = 100
for i in range(50):

    x = i, i

    a = np.random.randn(n)
    ci = sns.utils.ci(sns.algorithms.bootstrap(a), 68)
    ax.plot(x, ci, color="r")

    m = a.mean()
    se = a.std() / np.sqrt(n)
    ax.plot(x, [m - se, m + se], "o", color="k")

image

I am afraid you are right. I was thinking of computing standard error of median, but like you said it doesn't make sense for non-normal distributions.

I am closing this now and thanks for clarifying.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

JanHomann picture JanHomann  路  3Comments

Bercio picture Bercio  路  3Comments

alexpetralia picture alexpetralia  路  3Comments

phantom0301 picture phantom0301  路  3Comments

sungshine picture sungshine  路  3Comments