Yarn: Use a standard format (yaml) for yarn.lock

Created on 7 Apr 2018  路  10Comments  路  Source: yarnpkg/yarn

I'm reopening #2250 which was closed a bit too hastily. I just hit the use case of wanting to read yarn.lock from Python (as an introspection step), and although I thought yarn.lock was yaml, it's actually not. It's almost yaml.

Pipenv, which is heavily inspired by yarn, uses json for its lockfile which makes it very easy to parse.

I suggest either using json or yaml for the lockfile v2. If yaml's large featureset is an issue, I know there's "strict yaml" subsets (at least here for Python, I don't know for JS). JSON would make the most sense but that does mean comments would no longer be available.

triaged

Most helpful comment

The initial design goals, iirc:

  • Must be easy to read by a human
  • Must be easy to diff
  • Must be fast to parse

JSON didn't meet 1 (which was an arbitrary choice in some extent, I don't see the point of debating whether or not you agree with it), and YAML didn't meet 3. The flat style of the file was required by 2 and some internal architectural choices.

Imo we should have used a stricter subset of yaml (one missing feature however is that our lockfile supports multi-keys properties, which are annoying to do in yaml without backreferences; so we would have had to reparse the keys to split on , anyway), but now the ecosystem is what it is, and we have to deal with it.

Was the intention to discourage languages such as Java or C# from being able to read the file? :-P

Of course not... 馃槖

All 10 comments

I do agree, but that's something I would see for a major update, and it will probably take some time before we get there. In the meantime, we have published a package dedicated to parsing our lockfile format, if you happen to need it.

I know this is a while away, but to just share my preferences on this:

I would personally like the lockfile to be in JSON instead of any other format simply because it's easier to manage in a NodeJS ecosystem. Parsing formats like YAML will require more dependencies for any tooling that wants to read it. At which point, there's little difference from staying with the yarn-lockfile format.

If comments are the main blocker, I can only see these comments:

# THIS IS AN AUTOGENERATED FILE. DO NOT EDIT THIS FILE DIRECTLY.
# yarn lockfile v1

The version can be a property. The warning can be printed when integrity checks fail and/or added as another top-level property.

Off the top of my head, for the lock integrity check (if something doesn't already exist), hashing a subset of yarn.lock's tree and setting the result as another property could work.

Just curious, what was the thinking behind this design? As far as I can tell, this strange proprietary file format could easily be represented using an established standard such as JSON, YAML, XML, etc. Was the intention to discourage languages such as Java or C# from being able to read the file? :-P

The initial design goals, iirc:

  • Must be easy to read by a human
  • Must be easy to diff
  • Must be fast to parse

JSON didn't meet 1 (which was an arbitrary choice in some extent, I don't see the point of debating whether or not you agree with it), and YAML didn't meet 3. The flat style of the file was required by 2 and some internal architectural choices.

Imo we should have used a stricter subset of yaml (one missing feature however is that our lockfile supports multi-keys properties, which are annoying to do in yaml without backreferences; so we would have had to reparse the keys to split on , anyway), but now the ecosystem is what it is, and we have to deal with it.

Was the intention to discourage languages such as Java or C# from being able to read the file? :-P

Of course not... 馃槖

Thanks for this explanation.

Another question (if anyone knows): Why did people start referring to the shrinkwrap file as a "lock file"? This term has had an established meaning in the software industry for nearly 50 years. The ".lock" file extension is also pretty standardized on Unix systems. From this perspective, it seems unconventional to add a ".lock" file to source control, or to store 100k of important content in there.

The earlier name "shrinkwrap" seems more intuitive. It's not overloaded, and it sound like what it does. If Yarn's file format ever changes, perhaps the file name could be clarified also.

I can't tell for sure, but I think it just was called this way because a) it "locks" the dependencies and b) the shrinkwrap files produced by the NPM cli were riddled with bugs and I guess Yarn wanted to make the distinction super clear.

As a non anglophone speaker, I can also say that even after two years working on Yarn, I still had no idea what the literal meaning of the word shrinkwrap was until you brought the subject, so I guess calling it lockfile also helps international users 馃槤

@pgonzal I always assumed it was because Yehuda Katz had some initial input in the origin of the Yarn project. He was also one of the creators of Ruby Bundler, and Bundler uses the .lock convention https://bundler.io/v1.7/rationale.html#checking-your-code-into-version-control
Yarn's lock file serves basically the same purpose as Bundler's, so I think that's where the concept and name was taken from.

Another question (if anyone knows): Why did people start referring to the shrinkwrap file as a "lock file"? This term has had an established meaning in the software industry for nearly 50 years. The ".lock" file extension is also pretty standardized on Unix systems. From this perspective, it seems unconventional to add a ".lock" file to source control, or to store 100k of important content in there.

@pgonzal I always assumed it was because Yehuda Katz had some initial input in the origin of the Yarn project. He was also one of the creators of Ruby Bundler, and Bundler uses the .lock convention

Thanks! @wycats did Bundler have any problems with the "lock file" terminology, e.g. people's .gitignore files generally wanting to exclude the *.lock file extension?

Maybe this comment is little off-topic, but you were considering TOML? I'm not talking directly about yarn.lock, .yarnrc.yml can be related.

YAML is over-engineered. Yes, it is readable, but only if you use minimal subset. Human writable? Almost impossible!

Sorry, I do not want to be a troll - just wondering.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

victornoel picture victornoel  路  3Comments

davidmaxwaterman picture davidmaxwaterman  路  3Comments

FLGMwt picture FLGMwt  路  3Comments

baptistelebail picture baptistelebail  路  3Comments

sebmck picture sebmck  路  3Comments