Holoviews: hv.Points(df) with duplicate columns in dataframe

Created on 22 Mar 2018  路  7Comments  路  Source: holoviz/holoviews

The user guide says dataframes are supported, but I'm having trouble in cases with duplicate columns. The error message is also not very helpful in pinpointing what is going wrong.

Also, API documentation for element.chart.Chart does not mention dataframes, not sure if it should.

import numpy as np
import pandas as pd
import holoviews as hv
data = pd.DataFrame(np.random.randint(-100,100, size=(100, 2)), columns=list("AB"))

# Dataframe with non-duplicate columns works
hv.Points(data[["A", "B"]])

# Dataframe with duplicate columns does not work.
hv.Points(data[["A", "A"]])

# Converting to the numpy array works
hv.Points(data[["A", "A"]].as_matrix())

The last part of the traceback:
image

print(hv.__version__) # 1.9.4-x-g73c2735e7
print(pd.__version__) # 0.20.3

Most helpful comment

Having read up on the history a bit, I can sort of understand why pandas allows this. When reading in csv files with duplicate columns you don't want it to fail or mutate your columns automatically. Anyway adding the exception in holoviews should be easy.

All 7 comments

Actually duplicated columns are indeed not supported but the usage you outline isn't recommended in general. Rather than selecting the columns that you need from your data I would suggest declaring the dimensions instead, e.g.:

# non-duplicate columns
hv.Points(data, ['A', 'B'])

# Duplicate.
hv.Points(data, ['A', 'A'])

This avoids making copies of your dataframe, which is more efficient and means that plots can share a datasource in certain scenarios. We could probably support actually duplicated columns, but it doesn't seem like a scenario I'd ever recommend.

Me either! We could consider putting in a warning if we detect that happening, though?

We could consider putting in a warning if we detect that happening, though?

Good idea, but I think it should be an exception tbh, there is no way we'll be able to resolve which column is meant if it's duplicated.

I'd even go further and say that duplicate column names is a recipe for disaster and if I were a pandas developer I'd have argued that it should not be allowed at all.

Sure. It's certainly difficult for me to think of a valid use for a dataframe like that, unless there is no name at all for the columns involved.

I agree that an exception makes sense.

Frankly, I had no idea that duplicate columns were even allowed!

Having read up on the history a bit, I can sort of understand why pandas allows this. When reading in csv files with duplicate columns you don't want it to fail or mutate your columns automatically. Anyway adding the exception in holoviews should be easy.

Was this page helpful?
0 / 5 - 0 ratings