Dplyr: Query optimiser commit breaks some downstream sql_select methods

Created on 5 Mar 2017  路  6Comments  路  Source: tidyverse/dplyr

Per commit discussion, changes made to sql_render.tbl_sql() to optimise query production, breaks downstream sql_select() in some circumstances.

This occurs because the NextMethod(con = con) call in sql_render.tbl_sql() passes the connection to sql_render.tbl_lazy() as part of the dots argument. Where sql_select() methods have more arguments than the default implementation, this connection is passed in to the first non-default argument and causes sql_select() methods to fail.

This occurs in RSQLServer. A reprex can be found here: https://github.com/imanuelcostigan/RSQLServer/issues/142#issuecomment-284188736

bug

Most helpful comment

Yes, obviously it will get fixed.

All 6 comments

Let me know if you would like me to submit a PR with different NextMethod call

In addition to those at the commit and RSQLServer, here's a very simple reproducible example. You'll need a SQL Server instance for this.

library(dplyr)
db <- RSQLServer::src_sqlserver(<login details>)
tab <- tbl(db, <existing table>)
tab

#Error: length(order_by) not greater than 0L

Printing a tbl also prints the first few rows, which involves calling collect with the number of rows to get. collect(*, n=n) is one of the methods affected by this issue, hence the error.

And here's a simple demonstration of the NextMethod problem. I define a generic f, and methods for classes foo and bar. The foo method mimics the code in sql_render.tbl_sql.

f <- function(a, b, ...)
{
    UseMethod("f")
}

f.foo <- function(a, b=NULL, ...)
{
    cat("in f.foo\n")
    print(match.call())
    if(is.null(b)) b <- obj$b
    NextMethod(b=b)
}

f.bar <- function(a, b, c, ...)
{
    cat("in f.bar\n")
    print(match.call())
    NULL
}

obj <- structure(list(a=42, b=34), class=c("foo", "bar"))

f(obj, obj$b)

#in f.foo
#f.foo(a = obj, b = obj$b)
#in f.bar
#f.bar(a = obj, b = 34, c = obj$b)
#NULL

What's happening here is that the original call to f supplies _both_ a and b, as unnamed arguments. When this happens, NextMethod(b=b) adds another argument, this time named.

If instead, I were to do this:

f(obj)

#in f.foo
#f.foo(a = obj)
#in f.bar
#f.bar(a = obj, b = 34)

then it works as expected. Alternatively, I can redefine f.foo:

f.foo <- function(a, b=NULL, ...)
{
    cat("in f.foo\n")
    print(match.call())
    if(is.null(b)) b <- obj$b
    NextMethod()  # no more b=b
}

f(obj, obj$b)

#in f.foo
#f.foo(a = obj, b = obj$b)
#in f.bar
#f.bar(a = obj, b = obj$b)
#NULL

and this also works as expected.

This is the same as what happens with dplyr tbls. Doing something like collect(tbl, n=5) eventually results in a call to sql_render that specifies both qry and con as arguments. These are passed to sql_render.tbl_sql, which in turn calls NextMethod(con=con). This causes con to be duplicated like b above.

> collect(tab, n=5)
Error: length(order_by) not greater than 0L
> traceback()
12: stop(assertError(attr(res, "msg")))
11: assertthat::assert_that(length(order_by) > 0L, dbGetInfo(con)$db.version >= 
        11, is.integer(offset), offset >= 0)
10: sql_select.SQLServerConnection(con, query$select, from, where = query$where, 
        group_by = query$group_by, having = query$having, order_by = query$order_by, 
        limit = query$limit, distinct = query$distinct, ...)
9: sql_select(con, query$select, from, where = query$where, group_by = query$group_by, 
       having = query$having, order_by = query$order_by, limit = query$limit, 
       distinct = query$distinct, ...)
8: sql_render.select_query(qry, con = con, ...)
7: sql_render(qry, con = con, ...)
***> 6: sql_render.tbl_lazy(x, con, con = <S4 object of class "SQLServerConnection">)
***> 5: NextMethod(con = con)
***> 4: sql_render.tbl_sql(x, con)
3: sql_render(x, con)
2: collect.tbl_sql(tab, n = 5)
1: collect(tab, n = 5)

@hadley can i get some clarity as to whether or not this issue will be tackled prior to the next release? I'd be a bit disappointed if we were unable to resubmit RSQLServer to CRAN after being unable to work around dplyr internal changes in the last major release. I am happy to submit a PR (and be guided on what i should look into) if necessary.

Yes, obviously it will get fixed.

Ugh, it's crazy that NextMethod() doesn't use match.call()

Was this page helpful?
0 / 5 - 0 ratings