Dplyr: all.equal() problems

Created on 6 May 2017  路  10Comments  路  Source: tidyverse/dplyr

  • No tolerance in comparison (#2264)
  • Attributes are not compared (https://github.com/tidyverse/tibble/issues/198#issuecomment-292880516)
  • List columns cannot be compared (#2355)
  • Behavior different depending on the load status of dplyr

Would it be too much of a change to get rid of this override altogether? I propose to emit a warning that all_equal() should be used on tibbles in v0.6, and then remove the specialization in v0.7.

CC @hadley @jennybc.

breaking change feature wip

Most helpful comment

Is it clear when this fix will be included in dplyr?

All 10 comments

I think that's ok, although I feel like there's probably quite a few places where I rely on test_equal() working. It's probably better to be explicit though.

An alternative fix would be to write our own all.equal package and systematically work through all of the issues. I'm not sure that's worth the investment now, although it's probably worth it in the long run.

@hadley: Are we including this in dplyr 0.6.0?

No, it'll have to wait until the next version.

Is it clear when this fix will be included in dplyr?

Are there any thoughts on when/if attribute checking might be implemented? I'm creating a tbl_df subclass that just adds an attribute, and was surprised when some tests that I expected to fail, didn't.

For now I'm using a (rather janky) workaround with a custom all.equal() method. But this still doesn't allow for simple testing that a lossless roundtrip is possible from tbl_df to the subclass and back.

In case this is useful, dplyr's all.equal.tbl_df() is also problematic for data.frame columns, which the bare all.equal() handles well.

library(tibble)

d1 <- tibble(id = 1:3, x = tibble(a = 1:3, y = 1:3))
d2 <- tibble(id = 1:3, x = tibble(a = 1:3, y = NA))

# works fine
all.equal(d1, d1)
#> [1] TRUE
all.equal(d1, d2)
#> [1] "Component \"x\": Component \"y\": Modes: numeric, logical"              
#> [2] "Component \"x\": Component \"y\": target is numeric, current is logical"

library(dplyr, warn.conflicts = FALSE)

# broken
all.equal(d1, d1)
#> Error: Can't join on 'x' x 'x' because of incompatible types (tbl_df/tbl/data.frame / tbl_df/tbl/data.frame)
all.equal(d1, d2)
#> Error: Can't join on 'x' x 'x' because of incompatible types (tbl_df/tbl/data.frame / tbl_df/tbl/data.frame)

Created on 2019-02-04 by the reprex package (v0.2.1.9000)

I have been beaten by this (and went crazy :sweat:) in the last few days.
Finally I resolved to use the conversion to data.frame to make testthat::expect_equal() work :satisfied::

expect_equal(as.data.frame(object), as.data.frame(expected))

HTH

Quick question: the tolerance issue mentioned as the first bullet point of the first message hasn't been resolved or am I missing something?

It would really be nice to be able to account for small differences like all.equal(df, df + 1e-9) via the usual tolerance argument.

I think we should plan to rip out the existing all.equal.tbl_df method as it's clearly not correct. This will require updating a lot of tests, and will probably break some downstream packages, but I think it's worth it.

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

Was this page helpful?
0 / 5 - 0 ratings