I understand how the feature works, but as I'm not using it, I'm not quite sure what the exact intended workflow is.
@bestander @skevy @kittens
You mean the offline-mirror setting?
@bestander yeah
So we use it for offline installs: CI and internal projects.
yarn-offline-mirror
key with path to a folder with .tar.gz filesyarn add
, they are downloaded from npm repo but yarn.lock stores local path to the .tar.gz file instead of "https://registry.npmjs.org/...
yarn install
, node_modules are installed without going to networkPersonally, this is the single reason why I joined the effort on this project :)
BTW a related topic to discuss https://github.com/yarnpkg/yarn/issues/394
This is particularly useful in a monorepo setting (we have one at Exponent and then of course FB had theirs). All the packages and applications in the repo can share this same module cache. It makes CI/CD (especially monorepo CI/CD) great again. :-)
Personally, this is the single reason why I joined the effort on this project :)
Seems legit. 😄
Just so I'm clear, the primary goal is to make it possible to take a package.json
that references packages from the registry. You would like to check the tarballs of those packages into version control, together with a yarn.lock
, and you would like yarn install
to use those checked in tarballs instead of the registry. You would also like this to work for indirect dependencies.
Is that correct?
That is correct
As my colleague, @kentaromiura, pointed out.
We don't have to check in the .tar.gz files, originally we considered using a shared storage for a project but a source control system was as good
So I know the NPM guys intentionally made life difficult because they used the resolved field (when present) as the complete and only identity for a package. Their reasoning had to do with private registries. Somebody not targeting a public registry at all may have complete module name overlap. Even more ominously, a package caching server (so far the only sane way to deal with trying to use npm in a high dependability environment) could have been configured to selectively shadow some packages (or provide them past the point in time when the original source had deleted them). To complete the chaos, the npm registry owners proved that they are willing, under appropriate pressure, to themselves reassign a module name to a different project, as happened with kik prompting the famed left-pad disaster.
This makes the logic around upgrading a previously locked package awfully tricky.
Npm basically doesn't allow it. Either you keep everything locked exactly as it is, or you lose all your locked down versions at the same time. Or you manually chop out a big chunk of shrinkwrap and splicing in an updated version. o_O
How are we handling this? Do we see ourselves as having a certain contract with the user like, say, never accidentally at upgrade time replacing an installed package with something that is from a completely different codebase?
The reason I asked this question is that I want to propose a slightly different workflow that I think satisfies the requirements that people had when designing it in the first place, but with fewer rough edges.
The basic idea is that the yarn.lock
would continue to store the original name, version and sha from the original npm registry, and that the offline mirror configuration would instruct the registry resolver to use the mirror instead.
(In Cargo, the mirroring configuration happens at a level above the individual sources, as I describe below)
This is how we designed the mirror feature in Cargo, and it has a few nice properties:
.npmrc
, that you would like a mirror to replace a particular source. This makes it pretty easy for build bots and other environments along those lines to strictly enforce whatever requirements they need (using whichever resolver strategy they want) without imposing the restriction unnecessarily on developers.@conartist6 I'm trying to understand the problem you're describing.
How are we handling this? Do we see ourselves as having a certain contract with the user like, say, never accidentally at upgrade time replacing an installed package with something that is from a completely different codebase?
The way bundler and Cargo handle (what I believe you mean by) this problem is by using a "precise" version for every dependency in the lockfile that includes enough information to precisely identify it (and its source) but not including mirror information (which is supplied by configuration).
In Cargo, mirrors are required to share precisely the same sha as the original upstream source, and any replacements that change the source code are specified in Cargo.toml
(Cargo's equivalent of package.json
) using [replace]
sections:
[replace]
"foo:0.1.0" = { git = 'https://github.com/example/foo' }
"bar:1.0.2" = { path = 'my/local/bar' }
This means: "if you see foo v0.1.0
in the dependency graph, replace it with the code located at this git repo, and if you see bar v1.0.2
in the dependency graph, replace it with this local code I checked in. It works with any kind of resolver that normally works in Cargo, so it's pretty flexible. The rationale is that mirrors can be largely transparent to development if they share a SHA (and are largely operational concerns), while changing the code is a development concern and should be specified in the manifest.
Aside: Bundler works a little bit differently, but along the same lines: because bundler only allows a single name/version dependency in the entire dependency graph, a specified dependency in the top-level
Gemfile
always supersedes the registry. In other words, if you specify a dependency in aGemfile
, it's as if you had said[replace]
in Cargo.
In Cargo (and bundler's) case, we also require that replacements share a name and version number with the original package they're replacing, and the feature is largely used for emergency patches or things like "the bug is fixed on master but the author hasn't gotten around to publishing it yet".
I'm not entirely sure whether any of that directly targets the issue you're talking about. Can you clarify it a bit?
Yes, yes it definitely does target the issue I'm describing. Npm lacks hashes, and with that restriction they were forced to treat source URLs as the best guarantee of authenticity.
The setup that you are describing sounds quite attractive because it understands (on multiple levels) the difference between a cached copy and an override. Npm, infuriatingly, can't, which is why upgrading a cached package is such a nightmare.
@wycats do you propose moving resolved
lines from yarn.lock into a registry file that would map left-pad:1.1.0
to a remote or local location of a tarball?
For large single-repo projects the experience looks like this.
A developer wants to add left-pad
to a project.
There is an .npmrc at the root that haskpm-offline-mirror=./npm-offline-packages
.
The developer writes
yarn add left-pad@~1.1.1
The new dependency is added to package.json and yarn.lock file and the tarball is downloaded to npm-offline-packages
.
The nice thing about this approach is in simplicity, it is easy to review and easy to connect the dots.
If there is another project in the repository and another developer does
yarn add left-pad@~1.1.1
Then existing tarball will be reused and yarn.lock will refer it.
What would be different with the proposed approach?
If I understand what @wycats is saying, nothing in the workflow your describe is different for the user. The major difference is what data is stored in yarn.lock.
I understand your earlier response to suggest that yarn.lock would contain something concrete like:
{
location: "${npm-offline-packages}/leftpad-1.2.1.tgz",
dependencies: structure recurses...
}
This is the npm approach. The suggestion here is to, in yarn.lock, store:
{
descriptor: "leftpad-1.2.1",
hash: "d672jef2",
sourceRepository: "https://registry.npmjs.org",
dependencies: structure recurses...
}
This way the cache directory is searched instead of being directly referenced, which means it is trivial to change the cache directory configuration, either as a one-off or between dev/prod/test.
It also means that the program can easily know that if the user says yarn upgrade left-pad
, this means that it should go out to the npm central registry, fetch the newest left-pad available there, store it in the local cache, and update the descriptor and hash in yarn.lock.
@wycats do you propose moving resolved lines from yarn.lock into a registry file that would map left-pad:1.1.0 to a remote or local location of a tarball?
Not quite. The resolved
lines would go away, and a package would be identified uniquely by its "precise version" (which can be a tarball sha, but could also be things like git sha for example).
With no configuration, we'd use the "default remote" for a particular package. If you configure a mirror in .npmrc
, we'd map to the original package to that mirror instead during fetching and confirm that the fetched package matches the integrity information (sha).
A developer wants to add left-pad to a project.
There is an .npmrc at the root that has kpm-offline-mirror=./npm-offline-packages.The developer writes
yarn add left-pad@~1.1.1
The new dependency is added to package.json and yarn.lock file and the tarball is downloaded to npm-offline-packages.
The main difference so far is that the way to specify npm-offline-packages
would be a little more general, allowing you to specify a replacement mirror for any source, not just the npm registry.
The nice thing about this approach is in simplicity, it is easy to review and easy to connect the dots.
I agree, it's nice 😄
If there is another project in the repository and another developer does
yarn add left-pad@~1.1.1
Then existing tarball will be reused and yarn.lock will refer it.
What would be different with the proposed approach?
The main distinction is that the yarn.lock
would have the _original source_ rather than the resolved tarball, together with enough information to uniquely identify it (see @conartist above) and the .npmrc
is responsible for mapping the "npm registry" to the local mirror in the monorepo.
You can still look at the integrity information in the yarn.lock
, and you can still look at the in-repo cache of packages to connect the dots. You could also very easily move the location of the in-repo cache (or add additional ones at appropriate places in the hierarchy).
For people who are not using mono-repos, it makes it possible to use the same feature for production deploys without disturbing the normal development workflow, as well as paves the way for other kinds of mirrors that can work together with the in-repo mirror strategy. In other words, it's just a more general way of describing the same thing.
Finally, It also helps to rationalize what's going on with npm link
and the local yarn cache (there's no good reason that the local yarn cache behaves so differently from the offline mirror). Linking a package on your machine wouldn't disturb the lockfile, but would rather register a local mirror for the original package. At least for me, I really want uses of npm link
on a local machine to be invisible to other developers.
Generally, decoupling the "original source + unique identification" from "where we actually get the packages in practice" makes interactions between mirrors, links, and other similar features more reliable, but doesn't really change any fundamental capabilities.
That does make sense, it would also solve https://github.com/yarnpkg/yarn/issues/394.
What would be used as a key in the replacement map?
In yarn.lock we use strings like yeoman-welcome@^1.0.0
which are specific to a particular package.json of a direct or transitive dependency.
Should it be the full http path?
@bestander the way Cargo works is that there is a notion of "package id", which is a fully qualified package name that is guaranteed to be unique (each resolver gets to decide what is required for uniqueness).
Here's an example Cargo.toml
I just whipped up:
[package]
name = "ohai"
version = "0.1.0"
authors = ["Yehuda Katz <[email protected]>"]
[dependencies]
libc = "*"
And here's the lockfile Cargo generates:
[root]
name = "ohai"
version = "0.1.0"
dependencies = [
"libc 0.2.16 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "libc"
version = "0.2.16"
source = "registry+https://github.com/rust-lang/crates.io-index"
Cargo uses the word "source" to mean roughly the same thing as Yarn uses the word "resolver" for.
In this case, since the registry doesn't allow people to mutate existing crates, the fully resolved name of the registry, plus the package's name and version are sufficient.
For illustration, let me add another package to the Cargo.toml
, this one a git dependency:
[package]
name = "ohai"
version = "0.1.0"
authors = ["Yehuda Katz <[email protected]>"]
[dependencies]
libc = "*"
docopt = { git = "https://github.com/docopt/docopt.rs" }
Here's the output from cargo build
(more or less the equivalent of yarn install
):
$ cargo build
Updating git repository `https://github.com/docopt/docopt.rs`
Updating registry `https://github.com/rust-lang/crates.io-index`
Compiling lazy_static v0.2.1
Compiling regex-syntax v0.3.5
Compiling utf8-ranges v0.1.3
Compiling memchr v0.1.11
Compiling winapi-build v0.1.1
Compiling aho-corasick v0.5.3
Compiling kernel32-sys v0.2.2
Compiling strsim v0.5.1
Compiling rustc-serialize v0.3.19
Compiling winapi v0.2.8
Compiling thread-id v2.0.0
Compiling thread_local v0.2.7
Compiling regex v0.1.77
Compiling docopt v0.6.83 (https://github.com/docopt/docopt.rs#be283ce2)
Compiling ohai v0.1.0 (file:///C:/Code/ohai)
Finished debug [unoptimized + debuginfo] target(s) in 89.36 secs
And the updated Cargo.lock
:
[root]
name = "ohai"
version = "0.1.0"
dependencies = [
"docopt 0.6.83 (git+https://github.com/docopt/docopt.rs)",
"libc 0.2.16 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "aho-corasick"
version = "0.5.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"memchr 0.1.11 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "docopt"
version = "0.6.83"
source = "git+https://github.com/docopt/docopt.rs#be283ce2a00305998e89d98122cdad06e59dede4"
dependencies = [
"lazy_static 0.2.1 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 0.1.77 (registry+https://github.com/rust-lang/crates.io-index)",
"rustc-serialize 0.3.19 (registry+https://github.com/rust-lang/crates.io-index)",
"strsim 0.5.1 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "kernel32-sys"
version = "0.2.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"winapi 0.2.8 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi-build 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "lazy_static"
version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "libc"
version = "0.2.16"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "memchr"
version = "0.1.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"libc 0.2.16 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "regex"
version = "0.1.77"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"aho-corasick 0.5.3 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 0.1.11 (registry+https://github.com/rust-lang/crates.io-index)",
"regex-syntax 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
"thread_local 0.2.7 (registry+https://github.com/rust-lang/crates.io-index)",
"utf8-ranges 0.1.3 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "regex-syntax"
version = "0.3.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "rustc-serialize"
version = "0.3.19"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "strsim"
version = "0.5.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "thread-id"
version = "2.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"kernel32-sys 0.2.2 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.16 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "thread_local"
version = "0.2.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"thread-id 2.0.0 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "utf8-ranges"
version = "0.1.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "winapi"
version = "0.2.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "winapi-build"
version = "0.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
[metadata]
"checksum aho-corasick 0.5.3 (registry+https://github.com/rust-lang/crates.io-index)" = "ca972c2ea5f742bfce5687b9aef75506a764f61d37f8f649047846a9686ddb66"
"checksum docopt 0.6.83 (git+https://github.com/docopt/docopt.rs)" = "<none>"
"checksum kernel32-sys 0.2.2 (registry+https://github.com/rust-lang/crates.io-index)" = "7507624b29483431c0ba2d82aece8ca6cdba9382bff4ddd0f7490560c056098d"
"checksum lazy_static 0.2.1 (registry+https://github.com/rust-lang/crates.io-index)" = "49247ec2a285bb3dcb23cbd9c35193c025e7251bfce77c1d5da97e6362dffe7f"
"checksum libc 0.2.16 (registry+https://github.com/rust-lang/crates.io-index)" = "408014cace30ee0f767b1c4517980646a573ec61a57957aeeabcac8ac0a02e8d"
"checksum memchr 0.1.11 (registry+https://github.com/rust-lang/crates.io-index)" = "d8b629fb514376c675b98c1421e80b151d3817ac42d7c667717d282761418d20"
"checksum regex 0.1.77 (registry+https://github.com/rust-lang/crates.io-index)" = "64b03446c466d35b42f2a8b203c8e03ed8b91c0f17b56e1f84f7210a257aa665"
"checksum regex-syntax 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)" = "279401017ae31cf4e15344aa3f085d0e2e5c1e70067289ef906906fdbe92c8fd"
"checksum rustc-serialize 0.3.19 (registry+https://github.com/rust-lang/crates.io-index)" = "6159e4e6e559c81bd706afe9c8fd68f547d3e851ce12e76b1de7914bab61691b"
"checksum strsim 0.5.1 (registry+https://github.com/rust-lang/crates.io-index)" = "50c069df92e4b01425a8bf3576d5d417943a6a7272fbabaf5bd80b1aaa76442e"
"checksum thread-id 2.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "a9539db560102d1cef46b8b78ce737ff0bb64e7e18d35b2a5688f7d097d0ff03"
"checksum thread_local 0.2.7 (registry+https://github.com/rust-lang/crates.io-index)" = "8576dbbfcaef9641452d5cf0df9b0e7eeab7694956dd33bb61515fb8f18cfdd5"
"checksum utf8-ranges 0.1.3 (registry+https://github.com/rust-lang/crates.io-index)" = "a1ca13c08c41c9c3e04224ed9ff80461d97e121589ff27c753a16cb10830ae0f"
"checksum winapi 0.2.8 (registry+https://github.com/rust-lang/crates.io-index)" = "167dc9d6949a9b857f3451275e911c3f44255842c1f7a76f33c55103a909087a"
"checksum winapi-build 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "2d315eee3b34aca4797b2da6b13ed88266e6d612562a0c46390af8299fc699bc"
The github package we added added the following entry (plus all of its dependencies, of course):
[[package]]
name = "docopt"
version = "0.6.83"
source = "git+https://github.com/docopt/docopt.rs#be283ce2a00305998e89d98122cdad06e59dede4"
dependencies = [
"lazy_static 0.2.1 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 0.1.77 (registry+https://github.com/rust-lang/crates.io-index)",
"rustc-serialize 0.3.19 (registry+https://github.com/rust-lang/crates.io-index)",
"strsim 0.5.1 (registry+https://github.com/rust-lang/crates.io-index)",
]
We include the name and version of course, but also a fully qualified source name, which in the case of git repositories, includes the precise revision at the point where the lockfile was generated. Also note that all of the package versions in the lockfile are precise versions, rather than a version range, which makes the dependency graph easier to work with.
This also allows users to tighten versions (from "*"
to "1.3.0"
) without causing Cargo to believe that the lockfile has changed and trigger updates.
The bottom of the lockfile is a series of checksums in a single, non-source-specific form (SHA256):
[metadata]
"checksum aho-corasick 0.5.3 (registry+https://github.com/rust-lang/crates.io-index)" = "ca972c2ea5f742bfce5687b9aef75506a764f61d37f8f649047846a9686ddb66"
"checksum docopt 0.6.83 (git+https://github.com/docopt/docopt.rs)" = "<none>"
"checksum kernel32-sys 0.2.2 (registry+https://github.com/rust-lang/crates.io-index)" = "7507624b29483431c0ba2d82aece8ca6cdba9382bff4ddd0f7490560c056098d"
"checksum lazy_static 0.2.1 (registry+https://github.com/rust-lang/crates.io-index)" = "49247ec2a285bb3dcb23cbd9c35193c025e7251bfce77c1d5da97e6362dffe7f"
"checksum libc 0.2.16 (registry+https://github.com/rust-lang/crates.io-index)" = "408014cace30ee0f767b1c4517980646a573ec61a57957aeeabcac8ac0a02e8d"
"checksum memchr 0.1.11 (registry+https://github.com/rust-lang/crates.io-index)" = "d8b629fb514376c675b98c1421e80b151d3817ac42d7c667717d282761418d20"
"checksum regex 0.1.77 (registry+https://github.com/rust-lang/crates.io-index)" = "64b03446c466d35b42f2a8b203c8e03ed8b91c0f17b56e1f84f7210a257aa665"
"checksum regex-syntax 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)" = "279401017ae31cf4e15344aa3f085d0e2e5c1e70067289ef906906fdbe92c8fd"
"checksum rustc-serialize 0.3.19 (registry+https://github.com/rust-lang/crates.io-index)" = "6159e4e6e559c81bd706afe9c8fd68f547d3e851ce12e76b1de7914bab61691b"
"checksum strsim 0.5.1 (registry+https://github.com/rust-lang/crates.io-index)" = "50c069df92e4b01425a8bf3576d5d417943a6a7272fbabaf5bd80b1aaa76442e"
"checksum thread-id 2.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "a9539db560102d1cef46b8b78ce737ff0bb64e7e18d35b2a5688f7d097d0ff03"
"checksum thread_local 0.2.7 (registry+https://github.com/rust-lang/crates.io-index)" = "8576dbbfcaef9641452d5cf0df9b0e7eeab7694956dd33bb61515fb8f18cfdd5"
"checksum utf8-ranges 0.1.3 (registry+https://github.com/rust-lang/crates.io-index)" = "a1ca13c08c41c9c3e04224ed9ff80461d97e121589ff27c753a16cb10830ae0f"
"checksum winapi 0.2.8 (registry+https://github.com/rust-lang/crates.io-index)" = "167dc9d6949a9b857f3451275e911c3f44255842c1f7a76f33c55103a909087a"
"checksum winapi-build 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "2d315eee3b34aca4797b2da6b13ed88266e6d612562a0c46390af8299fc699bc"
We added this after the initial release of Cargo, and it ensures that we have a secure hash for any source, even though there are theoretical risks associated with the hashing strategy used by git, for example.
Cargo also has a command that you can use to get the fully qualified name of a package in the Cargo.lock
:
$ cargo pkgid docopt
https://github.com/docopt/docopt.rs#docopt:0.6.83
This package id contains just enough information to uniquely identify a package in the dependency graph (it's the identifier used in the dependency graph structure, in fact). When describing a replacement, it's always fine to use a more general name (like docopt
) as long as it uniquely identifies the package in the lockfile. If the specified name is ambiguous (which is rare -- it can only happen when --flat
wouldn't work in Yarn and you're referencing a duplicated package), Cargo helps you identify unambiguous names to use.
@wycats, thanks for giving some background info.
Let's think how we can improve the current situation with yarn.
In the lock file we have name (implied), version and where it gets resolved to.
I suppose we can have a separate file with resolution replacements:
yarn-resolutions.lock
(located at a monorepo root file)
https://registry.npmjs.org/yeoman-welcome/-/yeoman-welcome-1.0.1.tgz#f6cf198fd4fba8a771672c26cdfb8a64795c84ec ./local-mirror/yeoman-welcome-1.0.1.tgz
Would that be in par with Cargo features?
ping @wycats
I'm unclear from my limited use of yarn how much of a role the resolved
field plays when installing from lockfile on a second machine - but the npm behaviour of always phoning home for this was very frustrating. It led me to always remove this field from the generated npm-shrinkwrap.
The specific flow we wanted was that I would have my development machine point at the public registry, but CI would go via a proxy.
I think the way @wycats describes storing a reference to the package source separately from the package location would help enable this workflow.
As a strawman, something along the lines of this might work:
The checked-in lock file states the expected source and a hash
abab@^1.0.0:
version "1.0.3"
source "registry"
hash "b81de5f7274ec4e756d797cd834f303642724e5d"
Those sources would have default locations, and then separate environment-specific not-checked in config could override source locations - possibly giving an ordered list?
I've been testing custom yarn-cache folder for a few weeks, but I'm encountering a lot of Tarball is not in network and can not be located in cache
errors (on gitlab-ci, etc.) that would only be solved with a yarn --no-lockfile
or rm yarn.lock; yarn
. By any chance, would you have encountered such errors?
"error \"bunyan-1.8.4.tgz\": Tarball is not in network and can not be located in cache (\"/srv/player/.yarn-cache/bunyan-1.8.4.tgz\")", "stdout": "yarn install v0.17.0\n[1/4] Resolving packages...\n[2/4] Fetching packages...\ninfo Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command."
Somewhat related but maybe a little stray from the topic, is there any thoughts about dealing with node module installation scripts? There are plenty of node modules download additional codes during installation and thus the results of yarn install
is still non-repeatable even with the offline mirror.
The problem is that Node.js install scripts can execute any bash script, there is no way to reliably achieve offline mode without authors' cooperation.
The only thing that we can do is not use such modules or help the authors to think of offline mode.
How about caching the post-install results instead of pre-install results? I understand that will not work well with any codes with native platform dependencies, but that is not a problem any of our current solutions address either.
The worst can happen is that we can not use those badly behaving modules, which will be as bad as current situation.
Then it is as good as saving node_modules somewhere, for example, checking them into source control.
I agree that will be equivalent, with added benefits that offline mirror currently provides:
Well, it might work but it may be complex.
Yarn is already tracking a diff between caches and whatever happens after install scripts, see phantomFiles https://github.com/yarnpkg/yarn/blob/master/src/package-linker.js#L124 and beforeFiles in https://github.com/yarnpkg/yarn/blob/master/src/package-install-scripts.js#L280.
If you feel that you could make sense of it and have some sort of offline storage for build artifacts go ahead, send an RFC.
My concern packages may reach out beyond their own limits and even use their own cache folders, basically everything that a bash script can do although I don't know if any significant number do so.
@UnrememberMe, better discuss this in a separate issue/RFC
Agreed. Will open a separate issue/RFC.
@UnrememberMe did you happen to open one? Could you link to it?
I have started but have not finished the RFC yet. I should submit the pull request for RFC no later than Thursday 2/16/17. @gregsheremeta
Is there a concern about the offline mirror will start to bloat if it is stored in source control? Every minor version change will leave the old .tgz files.
Is there a plan for a clean up command that empties the offline mirror of module .tgzs that are not used in the yarn.lock?
Yeah, there is an RFC for that already.
An opt-in cleanup feature is coming
On 16 February 2017 at 21:13, jackhamburger notifications@github.com
wrote:
Is there a concern about the offline mirror will start to bloat if it is
stored in source control? Every minor version change will leave the old
.tgz files.
Is there a plan for a clean up command that empties the offline mirror of
module .tgzs that are not used in the yarn.lock?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/yarnpkg/yarn/issues/393#issuecomment-280462474, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ACBdWOsJV4qvTYkyeRvvDMkbRyzNNPCPks5rdLwUgaJpZM4KDHLg
.
@gregsheremeta The RFC was posted on Feb 16, 2017.
@UnrememberMe do you have a link to it? I don't see it in https://github.com/yarnpkg/rfcs
@gregsheremeta I updated the title for https://github.com/yarnpkg/rfcs/pull/50 The initial RFC title was not correct.
That's something I'm interested in too. I'll leave my thoughts here:
AFAIK, current yarn workflow is the following:
$HOME
to isolated location for each buildyarn install --pure-lock-file
to make sure people check in yarn.lock
yarn add/remove
commands to change package.json and yarn.lockyarn.lock
and do clean yarn install :( <--- not sure if it's correctEach step could be improved:
$HOME
?)yarn.lock
and conflicts are inevitableyarn git-resolve
would be niceHow is it related to tarball cache feature? Not much. Intended use case of it is to completely avoid yarn install during CI and saving all dependencies in the repository.
As we can see, local tarball cache would resolve all CI issues but somewhat complicate developer workflow issues.
Ideally, we'd want online repository but cacheable.
Yarn already has .cache
thus tarball cache doesn't look useful.
I.e. tarball cache wouldn't be needed if each yarn install
updated $HOME/ cache without corruption. And committing dependencies to git wouldn't be needed if it was guaranteed that npm registry is always online in case cache is corrupted.
TL;DR It would be nice if workflow wouldn't change while achieving performance and stability improvement.
@Vanuan in case of merge conflicts theres a merged pr making yarn install
solve merge conflicts (https://github.com/yarnpkg/yarn/pull/3544)
Yeap, I'm aware of that. That comment was back in April.
Still not clear whether you should use yarn install --pure-lock-file
in this case or whether yarn install
changes package versions or just resolves conflicts.
Closing this issue since it seems to be resolved, mostly by #2970 but possibly with other PRs.
Please create a new issue if you want to propose more features of fixes around this.
The last part of this blog post is referencing this ticket. Is it still valid or should the blog post be updated?
Most helpful comment
So we use it for offline installs: CI and internal projects.
yarn-offline-mirror
key with path to a folder with .tar.gz filesyarn add
, they are downloaded from npm repo but yarn.lock stores local path to the .tar.gz file instead of"https://registry.npmjs.org/...
yarn install
, node_modules are installed without going to network