Dplyr: first/last crashes R when order_by specified within mutate

Created on 19 Aug 2016  路  7Comments  路  Source: tidyverse/dplyr

dplyr 0.5.0

R version 3.3.0 (2016-05-03) -- "Supposedly Educational"
Platform: x86_64-w64-mingw32/x64 (64-bit)

R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree"
Platform: x86_64-redhat-linux-gnu (64-bit)

Reproduced on a windows machine and linux machine.

df <- data_frame(val = seq(1L, 10L, 1L), order = runif(10))

Ok:

df %>% mutate(last_val = last(val))

Crashes:

df %>% mutate(last_val = last(val, order_by = order))

bug

Most helpful comment

@banbh example's crashed on my Red Hat 6.7, Ubuntu 14.04, Windows 7, & Windows 8 machines (R 3.3.1; dplyr 0.5.0). But qualifying the functions doesn't.

library(magrittr)
requireNamespace("dplyr")

dplyr::data_frame(id = 1:9, value = 42) %>%
  dplyr::summarise(
    dplyr::first(value, order_by = id)
  )

http://stackoverflow.com/questions/39112813/how-to-use-order-by-with-first

update: the crash still occurs with yesterday's last build (dplyr 0.5.0.9000) on the Windows 8 & Ubuntu 14.04 machines.

All 7 comments

I can reproduce with R 3.3.1 and dplyr 0.5.0 on Windows 10. A smaller crashing example is

library(dplyr)
data_frame(id = 1:9, value = 42) %>% summarise(first(value, order_by = id))

Both work fine with dplyr 0.5.0 and R 3.3.1 on OS X.

@banbh example's crashed on my Red Hat 6.7, Ubuntu 14.04, Windows 7, & Windows 8 machines (R 3.3.1; dplyr 0.5.0). But qualifying the functions doesn't.

library(magrittr)
requireNamespace("dplyr")

dplyr::data_frame(id = 1:9, value = 42) %>%
  dplyr::summarise(
    dplyr::first(value, order_by = id)
  )

http://stackoverflow.com/questions/39112813/how-to-use-order-by-with-first

update: the crash still occurs with yesterday's last build (dplyr 0.5.0.9000) on the Windows 8 & Ubuntu 14.04 machines.

Reproduced on an Ubuntu VM. Some more information:

Program received signal SIGSEGV, Segmentation fault.
0x00007fffe9698615 in dplyr::Compare_Single_OrderVisitor<dplyr::OrderVectorVisitorImpl<13, true, dplyr::VectorSliceVisitor<13> > >::operator() (this=0x7ffffffefa70, i=1, j=4) at ../inst/include/dplyr/Order.h:69
69        if (obj.equal(i,j)) return i<j;

(gdb) bt
#0  0x00007fffe9698615 in dplyr::Compare_Single_OrderVisitor<dplyr::OrderVectorVisitorImpl<13, true, dplyr::VectorSliceVisitor<13> > >::operator() (this=0x7ffffffefa70, i=1, j=4) at ../inst/include/dplyr/Order.h:69
#1  0x00007fffe969c38b in __move_median_to_first<int*, dplyr::Compare_Single_OrderVisitor<dplyr::OrderVectorVisitorImpl<13, true, dplyr::VectorSliceVisitor<13> > > > (__comp=..., __c=0xea19e98, __b=0xea19e88, __a=0xea19e7c, __result=0xea19e78)
    at /usr/include/c++/4.8/bits/stl_algo.h:114
#2  __unguarded_partition_pivot<int*, dplyr::Compare_Single_OrderVisitor<dplyr::OrderVectorVisitorImpl<13, true, dplyr::VectorSliceVisitor<13> > > > (__comp=..., __last=0xea19e9c, __first=0xea19e78) at /usr/include/c++/4.8/bits/stl_algo.h:2294
#3  std::__introselect<int*, long, dplyr::Compare_Single_OrderVisitor<dplyr::OrderVectorVisitorImpl<13, true, dplyr::VectorSliceVisitor<13> > > > (__first=0xea19e78, __nth=0xea19e78, __last=0xea19e9c, __depth_limit=5, __comp=..., __comp@entry=...)
    at /usr/include/c++/4.8/bits/stl_algo.h:2394
#4  0x00007fffe969c9d8 in nth_element<int*, dplyr::Compare_Single_OrderVisitor<dplyr::OrderVectorVisitorImpl<13, true, dplyr::VectorSliceVisitor<13> > > > (__comp=..., __last=<optimized out>, __nth=<optimized out>, __first=<optimized out>)
    at /usr/include/c++/4.8/bits/stl_algo.h:5426
#5  dplyr::NthWith<14, 13>::process_chunk (this=0xea1d430, indices=...) at hybrid_nth.cpp:70
#6  0x00007fffe969cb9d in dplyr::Processor<14, dplyr::NthWith<14, 13> >::process (this=0xea1d430, index=...)
    at ../inst/include/dplyr/Result/Processor.h:40
#7  0x00007fffe969447d in dplyr::Processor<14, dplyr::NthWith<14, 13> >::process (this=this@entry=0xea1d430, df=...)
    at ../inst/include/dplyr/Result/Processor.h:35
#8  0x00007fffe96f0bf6 in summarise_not_grouped (df=..., dots=...) at summarise.cpp:113
#9  0x00007fffe96f1545 in summarise_impl (df=..., dots=...) at summarise.cpp:139
#10 0x00007fffe9633b0d in dplyr_summarise_impl (dfSEXP=0xe9d5e78, dotsSEXP=<optimized out>) at RcppExports.cpp:489

Thanks, confirmed. Would you like to contribute a testthat test?

I'm having this issue on my Windows 7 machine as well.
dplyr_0.5.0
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

# here鈥檚 what the sample data looks like
contacts
##          email id createddate
## 1 [email protected]  1  2016-01-01
## 2 [email protected]  2  2016-04-01
## 3 [email protected]  3  2016-10-01
## 4 [email protected]  4  2016-07-01

#this crashes
contacts %>%
     group_by(email) %>%
     summarize(   
         earliestid = last(id, order_by=createddate)
  )

#This works
contacts %>%
     group_by(email) %>%
     summarize(   
         earliestid =dplyr:: last(id, order_by=createddate)
  )

Looks like this will be resolved by #2143, but that might take some time until it's ready.

Was this page helpful?
0 / 5 - 0 ratings