This doesn't seem true to me: isequal(DataFrame(p = @pdata(1:3)), DataFrame(d = @data(1:3))).
Should either of these be true?:
isequal(DataFrame(d = @pdata(1:3)), DataFrame(d = @data(1:3)))
isequal(DataFrame(p = @data(1:3)), DataFrame(d = @data(1:3)))
I opened a related issue: JuliaStats/DataArrays.jl#46
Do you mean that isequal() should return false, but == should be true?
I was a little surprised both that isequal returns true when colnames differ, and that it returns true when one table has a DataArray and the other has a PooledDataArray.
But the latter (isequal([Data], [PooledData])) seems like it may be changed in DataArrays (JuliaStats/DataArrays.jl#46), and I guess only former (colnames) belongs here.
I think isequal(df1, df2) needs to take colnames into account, not sure about column order.
Yeah, that makes sense. But I think we'd better keep @pdata(1:3) == @data(1:3) to be true.
Rendered irrelevant by switch to CategoricalArrays.jl
But there's also the question of how to handle the column names and order, right?
I don't know; my natural inclination is that column names/order _would_ matter. I could _maybe_ get behind column _order_ not mattering, but I think it's just simpler overall to require names/order to match.
Definitely on the same page that column names matter.
Re: order, it'd be kind of nice if we didn't have integer indexing of columns in the public api, but since df[1] is still a thing, I guess our hands are tied -- I'd expect for two indexable isequal collections, that indexing into both returns isequal elements.
I think the consensus is that column names and their order should matter, as is currently the case.
Most helpful comment
I don't know; my natural inclination is that column names/order _would_ matter. I could _maybe_ get behind column _order_ not mattering, but I think it's just simpler overall to require names/order to match.