Deno: Proposal: Lock file should resolve URLs

Created on 17 Apr 2020 · 14Comments · Source: denoland/deno

Currently, lock files in Deno are used solely for integrity check. It does not prevent deno cache (~/.cache/deno) from having different version of the library.

Lock files in NPM and in Cargo are used to resolve a version range to a specific version. Although Deno lacks concepts such as version or package, most existing Deno modules are hosted in git.

Alternate Proposals

Choose one of the following:

Proposal 1

Lock files should store commit ids in some form in addition to hash.
When fetching with a lock file, Deno may attach a special HTTP request header (such as X-Git-Commit-ID). Servers that support this HTTP request should are expected to attach another HTTP respond header to help Deno.
deno_website2 should support that HTTP request header.

Lock file structure:

_Schema:_

interface LockFile {
  [url: string]: {
    contentHash: string
    tag: string
  }
}

_Example:_

{
  "https://deno.land/x/foo/bar.js": {
    "contentHash": "fa251ce...",
    "tag": "44eb7a..."
  },
  ...
}

Proposal 2

Lock files should store a unique URL in addition to hash.
When fetching with a lock file, Deno does not use URLs from the TypeScript files directly, but replace them with resolved URLs from the lock file.
If a URL is not yet resolved (added after lock file update), Deno may request that URL directly. The server then respond with a special HTTP status/header.
- Is HTTP 307 Temporary Redirection good enough? Or do we need servers to explicitly tell that they support this protocol by attaching a special HTTP respond header? I am leaning toward the latter.
deno_website2 should support this protocol.

Lock file structure:

_Schema:_

interface LockFile {
  [url: string]: {
    resolvedUrl: string
    contentHash: string
  }
}

_Example:_

{
  "https://deno.land/x/foo/bar.js": {
    "resolvedUrl": "https://deno.land/x/foo@44eb7a.../bar.js",
    "contentHash": "fa251ce..."
  },
  ...
}

Proposal 3

Like proposal 2 but with source map. Deno should generate source map.

cli suggestion

Source

KSXGitHub

Most helpful comment

The discussion might be helped by providing an example of a current lockfile, and then a mock up of how you think it should look.

ry on 17 Apr 2020

👍2

All 14 comments

I don't see how the commit ID adds anything, because the hash of the content should already define that a file is unique. If anything we shouldnt introduce a new header, rather just use ETag: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag

lucacasonato on 17 Apr 2020

👍1

@lucacasonato

I don't see how the commit ID adds anything

deno.land/x/ is a proxy server, it does not possess third-party modules that it serves. Therefore, to match against a content hash that user sends, it must waste bandwidth and computation and fetch different versions of one file until it matches. Very wasteful. On the other hand, commit IDs are easily looked up.

If anything we shouldnt introduce a new header, rather just use ETag: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag

To clarify, you meant a combination of ETag and If-Match? Is storing commit IDs in ETag instead of content hash a valid practice?

KSXGitHub on 17 Apr 2020

The discussion might be helped by providing an example of a current lockfile, and then a mock up of how you think it should look.

ry on 17 Apr 2020

👍2

@ry I do not think how a lock file should look all that important right now. What is important is how Deno and module provider server should communicate. I want to discuss it first, only then we may know what information the lock file should contain.

KSXGitHub on 17 Apr 2020

👍1

I have added examples of would-be lock file to both alternative proposals.

KSXGitHub on 19 Apr 2020

What is important is how Deno and module provider server should communicate.

Deno doesn't have any concept of a "module provider" though - imports are just URLs with no additional semantics. Are you imagining a change to the module resolution algorithm? If so, I think that might be the place to start discussion, rather than the lock file!

When fetching with a lock file, Deno does not use URLs from the TypeScript files directly, but replace them with resolved URLs from the lock file.

This sounds like what the import map file is for? That's exactly how I've been using it. All my source files import, for example, https://deno.land/std/..., and then my import map pins that to a specific version or commit hash e.g. https://deno.land/[email protected]/.

crabmusket on 20 Apr 2020

❤1

This sounds like what the import map file is for? That's exactly how I've been using it. All my source files import, for example, https://deno.land/std/..., and then my import map pins that to a specific version or commit hash e.g. https://deno.land/[email protected]/.

Great point! However, there is currently no way to use Deno to generate import map automatically. It would be great if there was machanism in place (supported by deno executable and deno_website2). Perhaps I should change the title of this issue or even open a new one regarding this?

KSXGitHub on 20 Apr 2020

While I can't speak for the deno team obviously, it seems like they've always maintained the principle that this sort of dependency management is a userland task. For example, udd does the job for deps.ts files.

I've opened an issue to see if Hayd would consider supporting import maps too.

crabmusket on 20 Apr 2020

@crabmusket I don't think a third-party tool is capable of doing this:

It requires intercepting Deno module resolution to know where a URL is resolved to.
It cannot do it alone, deno_website2 has to implement this.

KSXGitHub on 20 Apr 2020

It requires intercepting Deno module resolution to know where a URL is resolved to.

Because Deno doesn't have a module resolution algorithm, it should be enough to use e.g. the reference implementation of import maps. Or do I misunderstand what you're referring to?

It cannot do it alone, deno_website2 has to implement this.

Do you mean in order to e.g. discover what versions of a particular module are available? udd seems to do this by implementing a 'package searcher' for each registry it understands: https://github.com/hayd/deno-udd/blob/master/registry.ts If you import packages from somewhere that udd doesn't understand, then sadly it wouldn't work.

crabmusket on 20 Apr 2020

It requires intercepting Deno module resolution to know where a URL is resolved to.

Because Deno doesn't have a module resolution algorithm, it should be enough to use e.g. the reference implementation of import maps. Or do I misunderstand what you're referring to?

Reference implementation is only usable when import map is available. I am referring to when user adds a new import URL that has yet to be resolved by import map. Deno would send this URL as-is to the server, then the server would send back resolved URL (by, for example, redirection) and data.

It cannot do it alone, deno_website2 has to implement this.

Do you mean in order to e.g. discover what versions of a particular module are available? udd seems to do this by implementing a 'package searcher' for each registry it understands: https://github.com/hayd/deno-udd/blob/master/registry.ts If you import packages from somewhere that udd doesn't understand, then sadly it wouldn't work.

Yes. And it is the server that resolve the final URL (as I said in the above paragraph), not the client (like what npm and yarn do). Regarding udd, it would not understand a custom in-house private registry that companies like to do.

KSXGitHub on 20 Apr 2020

In a different issue I've suggested a simpler approach for verifying the integrity of imports, which I feel is more in-line with Deno's philosophy of minimizing auxiliary metadata files and abandoning the npm-style concept of "module directory structures". It is pretty simple, in essence:

There is no concept of "bundled modules" or "directory structures", only individual source files.
When I import a file from some arbitrary source, e.g.:

import * from "https://some.website/path/name@version/module.ts"

I want a strong cryptographic proof that I always get the same content and behavior (not just of module.ts but of its entire dependency tree).

So I walk the dependency tree of module.ts and create a file called module.ts.lock listing all the files it directly or indirectly references (remember - no bundles, only individual files) and their hashes.

https://some.website/path/name@version/module.ts Qmf8obm7bxrQS1JnjUniJdibcN2kUJy9zz732sr7o3dxtn
https://some.website/path/name@version/utils.ts Qmeg1Hqu2Dxf35TxDg18b7StQTMwjCqhWigm8ANgm8wA3p
https://some.website/path/name@version/methods.ts QmZfSNpHVzTNi9gezLcgq64Wbj1xhwi9wk4AxYyxMZgtCG

https://someother.website/path/name@version/othermodule.ts QmbKxNNCxBox7Cmv3jiUZbiG3zpzmtnYzVUuKHxfAjvpyH
https://someother.website/path/name@version/othermoduleutils.ts QmPwwoytFU3gZYk5tSppumxaGbHymMUgHsSvrBdQH69XRx

https://deno.land/[email protected]/async/delay.ts QmaLRet8qeYqNaq8xJeiqwjNnukSo3uEA8oWsDLoxxBv4Q
https://deno.land/[email protected]/async/deferred.ts QmWZtn3ahqqpGBBRZqPdthcWz2n1rxc1UuiDoWXrgrHKzZ

...

I hash this file and annotate the result to the import statement like this (the # part is never sent to the server, it is only for local use):

import * from "https://some.website/path/name@version/module.ts#lock_cid=QmaLRet8qeYqNaq8xJeiqwjNnukSo3uEA8oWsDLoxxBv4Q"

(In essence: the lock hash I annotated expresses the hash of the dependency tree of module.ts)

Now anyone who sees the annotated import statement has two options:

Redo what I did (recreate the lock file by themselves) and check if they get the same hash.
Use the hash I provided as a content-addressed reference and fetch the actual lock file from somewhere (server, local directory, IPFS), and use it for verification.

Option 2 has several advantages:

If verification fails - meaning one or more of the dependencies have been modified, it would be possible to specify exactly which ones they were. With option 1 it is only possible to know that something was modified, but not exactly what.
The lock file content contains the entire dependency tree of module.ts (including the hash of module.ts itself), so this allows to potentially instantly prefetch thousands of files, either using their URIs or using their hashes - from a content addressed network like IPFS, all in parallel, potentially greatly improving performance (and reliability, since IPFS could provide a secondary backup).

(One interesting/peculiar feature of this is that if the actual lock file is lost, but its hash can still be verified, it can be regenerated and effectively be "brought back to life"!)

Now the fact that the hashes are "burned-into" the code also has several advantages, especially from a security perspective.

Say I import a third-party file and the import statement annotated a lock hash:

import * from "https://some.website/path/name@version/module.ts#lock_cid=QmaLRet8qeYqNaq8xJeiqwjNnukSo3uEA8oWsDLoxxBv4Q"

And module.ts also imports a third-party file with an import statement annotated with a lock hash:

import * from "https://someother.website/path/name@version/othermodule.ts#lock_cid=QmbKxNNCxBox7Cmv3jiUZbiG3zpzmtnYzVUuKHxfAjvpyH"

And so on..

If someone imports my file, they would need to ensure that all of these lock hashes are consistent with each other. This means that the more lock hash annotations are provided throughout the dependency tree, the less likely it is that one of the modules has been "hacked" or contains malicious code, because every importer takes the responsibility to verify that the module they are importing is safe for themselves.

This results in something akin to a web of trust that is an ad-hoc decentralized system of authenticity management, which I feel is more appropriate for Deno than a centralized one like the npm system.

rotemdan on 30 May 2020

maybe I'm missing something, but I'm not sure that there is a need to go through the dependency three per file, though... If you are importing deno bundle it should already include everything :

"...the motivation around deno bundle, is to provide a single JavaScript file with all the dependencies resolved and included in the file."

so only hash at the bundle level would be enough to verify bundle content?

srdjan on 31 May 2020

The dependency tree of some arbitrary file, may contain references to modules from various third-party publishers, which in turn may refer to modules from other third-party publishers, etc. The system of annotations I proposed allows those publishers to "engrave" their expectations from the dependencies they import, right into the code. For example:

Say somewhere in your dependency tree, third-party publisher A imports from third-party publisher B, and the import statement looks like this:

import * as moduleB from "https://publisherB.website/package@version/module.ts"

It could be that you personally completely trust publisher A, and even B as well, but no matter how much you trust them, there is no way for you to know that the content of module.ts hasn't been tampered with by some unknown "hacker" entity at some point.

If you just create a lock file for yourself, then it is possible you've created a lock file for a compromised piece of code. In other words, the lock file you've made for yourself doesn't provide you with any confidence of the authenticity of module.ts.

However, if that import was annotated with the hash of the dependency tree of module.ts:

import * as moduleB from "https://publisherB.website/package@version/module.ts#lock_cid=QmaLRet8qeYqNaq8xJeiqwjNnukSo3uEA8oWsDLoxxBv4Q""

If module.ts or one of its dependencies has been tampered with, it is much more likely that there will be a hash mismatch detected. And the more the code would contain annotations of this kind, the more likely a mismatch is to be found since these lock hashes would be made by various different publishers, at different times.

In order to highlight the fact that the lock hash doesn't necessarily have to come from the publisher I described the users producing lock hashes for their own imports. In practice, it is more likely that the lock hash would be provided by the library authors themselves, as part of the import URL they provide and also the file would most likely hosted at the server with the name module.ts.lock as convention.

_For the question asked_ (sorry for being lengthy, but I thought I needed to provide some clarifications and background in order to fully answer your question): the process of bundling an import would include verifying all the hashes annotated into the modules that are encountered throughout the dependency tree.

The bundled file may itself be annotated with a lock hash, of course, but if it is published as part of a library, it would lose some of the benefits of the trust / authenticity / tamper-detection model I described. If it is only for personal use then that doesn't really matter, I guess.

rotemdan on 31 May 2020

Was this page helpful?

0 / 5 - 0 ratings