Drake: Allow targeting of rescue_cache

Created on 13 Nov 2017  路  15Comments  路  Source: ropensci/drake

I have a situation where target X won't build because it's unable to successfully load a dependency. I see the Error in readRDS(self$name_hash(hash)) : error reading from connection error.

I'm currently running rescue_cache with four jobs and rescue_cache is taking up a total of 100GB for my 25GB-on-disk project.

I'm not sure how feasible this is, but it would be nice to be able to target rescue_cache somehow.

Also, even when I meet the memory limit of my machine, R's garbage collection fails to get rid of old stuff. I wonder if you need to return NULL here:

> drake:::rescue_del
function (key, cache, namespace) 
{
    tryCatch(cache$get(key = key, namespace = namespace), error = function(e) {
        cache$del(key = key, namespace = namespace)
    })
}

I also wonder if there's a way to get drake to perform this operation whenever it meets a failed cache$get. You might stop building whatever target failed and give a warning with something to the effect of "try again".

All 15 comments

Thanks for bringing this up. Please try a03c32a3f34bb8600402c32dda7fbf6adea15922. I think I patched the memory issue, and rescue_cache() can accept a targets argument now.

Also, just so we're clear, what exactly do you mean by the ability to "target rescue_cache"? I thought you meant the ability to rescue specific targets, which is why I added the targets argument in a03c32a3f34bb8600402c32dda7fbf6adea15922.

Correct - the ability to rescue specific targets. A great solution would be that it would also rescue a target's dependencies. Since I find a target fails to build with the above error, I'd like to be able to "rescue" that target, which would also go up the tree and rescue upstream dependencies which might be failing.

Hmm... not sure I'll build it in as a special case, but you can do something like the following.

load_basic_example()
make(my_plan)
target <- "coef_regression1_large"
command <- my_plan$command[my_plan$target == target]
depends <- deps(command)
rescue_cache(targets = c(target, depends))

Perhaps if targets fail with exactly the above error Error in readRDS(self$name_hash(hash)) : error reading from connection, you could get drake to automatically try and rescue the dependencies?

It may be convenient in some cases, yes. But I have a strong preference for keeping drake's code base general, especially when the user can make up the difference with easy 3-liners. Also, the nature of the specific error message depends on storr internals, which could change more readily than the API.

By the way: for anyone looking on, if you don't remember why a target failed, you can always diagnose(your_target) to get the full error log: traceback, message, etc.

But anyway, we have a targets argument, and the memory issue should be gone now. @kendonB, please correct me if I am wrong.

@wlandau-lilly I just tried running rescue_cache untargeted and the memory issue remains.

Baffling. I wonder if the cache's environment is blowing up. What do you get for the following?

cache <- get_cache() # Assuming you're using the default '.drake/' cache
out <- rescue_cache(cache = cache) # Return the cache as 'out'
print(object.size(out), units = "MB")
print(object.size(cache), units = "MB")

If I am right about the storr environment blowing up, then 4e62fe4da2acb403183b70390dbb5f2f9e02e319 has a good chance of fixing the memory issue. Please try again when you get a chance.

Will try soon. Might also just be a mclapply quirk:

From: http://r.789695.n4.nabble.com/mclapply-memory-leak-td4711759.html

"Thanks for the detailed analysis Simon. I figured out a workaround that seems to be working in my real application. By limiting the length of the first argument to mclapply (to the number of cores), I get speedups while limiting the memory overhead."

Seems to be working!

Fantastic! All I did was dodge some of storr鈥檚 internals, which makes me think storr could be more memory-efficient.

But now that I followed up on your mclapply() reference, I cannot entirely explain why my solution did work. So in 1e7b80b601f775de0998b5df58a708a6837eda9a, I added an extra precaution to prevent memory leaks:

touch_storr_object <- function(key, cache, namespace){
  envir <- environment() # new line in 1e7b80b601f775de0998b5df58a708a6837eda9a
  hash <- cache$get_hash(key = key, namespace = namespace)
  value <- cache$driver$get_object(hash = hash)
  remove(value, envir = envir) # new line in 1e7b80b601f775de0998b5df58a708a6837eda9a
  invisible(NULL)
}

Please let me know if this problem re-emerges.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

matthiasgomolka picture matthiasgomolka  路  8Comments

rsangole picture rsangole  路  3Comments

boshek picture boshek  路  6Comments

htlin picture htlin  路  4Comments

pat-s picture pat-s  路  5Comments