Julia: Precompile should check staleness using a hash instead of a timestamp

Created on 5 Aug 2016  路  10Comments  路  Source: JuliaLang/julia

As discussed in https://github.com/JuliaStats/StatsBase.jl/issues/202, alternating Pkg.add(X) and using X statements can lead to errors. According to @tkelman, this could be fixed by making require check staleness by looking at the timestamp instead of a hash.

Example:

$ rm -rf ~/.julia/v0.4
$ rm -rf ~/.julia/lib/v0.4
$ julia -e "Pkg.update()"
INFO: Initializing package repository /Users/tamasnagy/.julia/v0.4
INFO: Cloning METADATA from git://github.com/JuliaLang/METADATA.jl
INFO: Updating METADATA...
INFO: Computing changes...
INFO: No packages to install, update or remove
$ julia
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.5 (2016-03-18 00:58 UTC)
 _/ |\__'_|_|_|\__'_|  |
|__/                   |  x86_64-apple-darwin15.4.0

julia> Pkg.add("StatsBase")
INFO: Installing BinDeps v0.4.1
INFO: Installing Compat v0.8.6
INFO: Installing Rmath v0.1.2
INFO: Installing SHA v0.2.0
INFO: Installing StatsBase v0.9.0
INFO: Installing StatsFuns v0.3.0
INFO: Installing URIParser v0.1.5
INFO: Building Rmath
INFO: Package database updated

julia> using StatsBase
INFO: Precompiling module StatsBase...

julia> Pkg.add("Distributions")
INFO: Installing Calculus v0.1.15
INFO: Installing Distributions v0.10.0
INFO: Installing PDMats v0.4.2
INFO: Building Rmath
INFO: Package database updated

julia> using Distributions
INFO: Precompiling module Distributions...
INFO: Recompiling stale cache file /Users/tamasnagy/.julia/lib/v0.4/Rmath.ji for module Rmath.
INFO: Recompiling stale cache file /Users/tamasnagy/.julia/lib/v0.4/StatsFuns.ji for module StatsFuns.
INFO: Recompiling stale cache file /Users/tamasnagy/.julia/lib/v0.4/StatsBase.ji for module StatsBase.
INFO: Recompiling stale cache file /Users/tamasnagy/.julia/lib/v0.4/Distributions.ji for module Distributions.
WARNING: Module Rmath uuid did not match cache file
  This is likely because module Rmath does not support  precompilation but is imported by a module that does.
ERROR: __precompile__(true) but require failed to create a precompiled cache file
 in require at /usr/local/Cellar/julia/0.4.5/lib/julia/sys.dylib
precompile

Most helpful comment

Making staleness depend on the hash would be great for users who work on distributed system (supercomputers and such). @andreasnoack and I have been running into problems sometimes.

All 10 comments

Pkg check staleness

staleness check happens in using (actually require, which is also used by import), not Pkg

Thanks! I updated the original post.

I see, the problem here is that the timestamp has changed because Pkg.build("Rmath") regenerates its deps.jl file, but the code hasn't actually changed so the rebuild (which triggers #12508) is not actually needed.

As a workaround, @BinDeps.install could overwrite the deps.jl file only if it has changed.

that was a workaround I suggested in the StatsBase issue, but I think this could happen for other reasons and might be good for making .ji files easier to redistribute, if the checksum is cheap enough. x-ref https://github.com/JuliaLang/julia/pull/12458#issuecomment-129859137

Making staleness depend on the hash would be great for users who work on distributed system (supercomputers and such). @andreasnoack and I have been running into problems sometimes.

Here is a nice-looking hardware-accelerated BSD-licensed CRC32c implementation in C by Mark Adler: http://stackoverflow.com/questions/17645167/implementing-sse-4-2s-crc32c-in-software/1

And here is a benchmark of different CRC routines, concluding that Adler's routine is pretty much the fastest.

I've always been fairly against doing this with checksums. I agree there needs to be more development here, but I haven't been able to develop a complete proposal yet to address these and other similar issues.

@vtjnash, doing if !timestamps_match && !checksums_match then regenerate_cache seems like a strict improvement over if !timestamps_match (if the time for the checksum is negligible, which seems likely), by reducing the chance of false positives. What is the nature of your objection?

Should this be closed in light of the discussion at #18127?

Was this page helpful?
0 / 5 - 0 ratings