Html: Define behavior for `file://` documents' origin.

Created on 6 Oct 2017 · 13Comments · Source: whatwg/html

The text as https://html.spec.whatwg.org/#sandboxOrigin defines a document's origin in the case that "the Document's URL's scheme is a network scheme" and for data: schemes, but declines to define behavior for non-network schemes like file:. Unsurprisingly, different browsers have made different choices here. When a document is loaded from file:///directory/file.html:

Edge
- returns file:// for location.origin (it doesn't yet implement window.origin)
- allows DOM access to file:///directory/other-file.html, file:///directory/subdirectory/file.html, and file:///parent-directory.html when framed.
- shares localStorage across all file: URLs
- times out for reasons I can't figure out when fetch() is called on file: URLs.
Firefox
- returns null for window.origin
- allows DOM access to both file:///directory/other-file.html and file:///directory/subdirectory/file.html, and disallows DOM access to file:///parent-directory.html when framed.
- localStorage seems scoped to the same-or-sub directory as well.
- allows fetch() to access content from file:///directory/other-file.html and file:///directory/subdirectory/file.html, and returns a network error when fetching from file:///parent-directory.html
Safari
- returns null for window.origin
- opens Finder windows rather than framing file:///directory/other-file.html, file:///directory/subdirectory/file.html, and file:///parent-directory.html (that is, <iframe src="file:///whatever/directory/file.html"></iframe> stays in about:blank and pops a Finder window to /whatever/directory/)
- blocks localStorage for all file: URLs
- returns a network error when calling fetch() on file:///directory/other-file.html, file:///directory/subdirectory/file.html, and file:///parent-directory.html.
Chrome
- returns null for window.origin
- disallows DOM access to file:///directory/other-file.html, file:///directory/subdirectory/file.html, and file:///parent-directory.html when framed.
- shares localStorage across all file: URLs
- returns a network error when calling fetch() on file:///directory/other-file.html, file:///directory/subdirectory/file.html, and file:///parent-directory.html.

I wonder if we could get more alignment if we talked about it a bit. There seems to be general agreement that the page should have an opaque origin, but a little bit of disagreement about what that should mean. I'd kinda like to keep Chrome's behavior for DOM access and Fetch, for instance, as it protects against scanning the entire disk or a user's downloads directory. I'm less enthusiastic about Chrome's localStorage behavior. I'd prefer Safari's, I think, but could live with something less draconian if there's good reason to.

@annevk, @travisleithead, @johnwilander: Would y'all mind looping in relevant folks (or having opinions yourselves? :) )?

securitprivacy cookie origin

Source

mikewest

👍2

All 13 comments

Maybe worth discussing the behavior of document.cookie in file URLs too.

shhnjk on 6 Oct 2017

@shhnjk: Yeah. I didn't do exhaustive tests, but I assume that the localStorage behavior is indicative of what the browsers are doing with other storage mechanisms (cookies, IndexedDB, etc). Ideally, we'd align all of them.

mikewest on 6 Oct 2017

/cc @whatwg/security for thoughts.

mikewest on 21 Nov 2017

The actual Firefox behavior is more or less like so:

1) Each file:// URL gets its own origin. It's not a unique origin, in the sense that if you load the same file:// thing you get the same origin, but it's different from all other file:// URL origins.
2) A file:// load from a file:// origin that represents a file in the same directory or an ancestor directory inherits the loading origin (just like data: does in most cases, and used to in all cases in Firefox). This explains the localStorage behavior, the fetch behavior, etc.

There are some implications from this not captured by the discussion above. Specifically, if file:///A/test.html loads an iframe from file:///A/B/subframe.html which then loads a subframe from file:///A/subsubframe.html, then all three documents have the same origin, and that origin is the origin of file:///A/test.html. But if you started off by loading file:///A/B/subframe.html from the URL bar, then it and the subsubframe it loads would have different origins, because it would be loading something from an ancestor directory.

The fundamental reason for the Firefox behavior was to allow things like HTML help systems and whatnot to work. There are some drawbacks, of course. There's the problem of scanning the download directory. There's some weirdness around interacting with symlinks (see https://bugzilla.mozilla.org/show_bug.cgi?id=670514). That sort of thing.

In addition to the document origin question, there's the question of subresources. If I have a document at file:///A/test.html that loads an image from file:///B/test.png and draws it to a canvas, can it getImageData? Can it access the CSSOM of stylesheets from file:///A/test.css? Does it get sane error reporting for errors from a script loaded from file:///A/test.js?

bzbarsky on 21 Nov 2017

Thanks, @bzbarsky! That's helpful context!

A file:// load from a file:// origin that represents a file in the same directory or an ancestor directory inherits the loading origin

Based on the file:///A/B/subframe.html example below, I think you meant "child directory" here? Is that right?

The fundamental reason for the Firefox behavior was to allow things like HTML help systems and whatnot to work.

I can see this as a real concern. But, Chrome's been shipping tighter behaviors than Firefox for some time now, and the anecdotes I know about personally are positive. For example, my partner often gets HTMLized schoolbooks on CD from which they can print out worksheets and etc. for their classes. Thus far, Chrome hasn't frustrated that effort. shrug Things seem to just work without DOM or XHR access.

I grant that this might not be the case for more complex material, but (again anecdotally) I haven't seen any bugs filed against on the issue. It might not be a large use case? Or perhaps everyone who needed it has migrated to Firefox? Tough to tell from metrics alone...

There's the problem of scanning the download directory.

This does seem to be a real problem. Moreso for Edge than for Firefox, though, given that it doesn't seem to do directory-based scoping.

Is this a problem that Firefox would be interested in poking at?

If I have a document at file:///A/test.html that loads an image from file:///B/test.png and draws it to a canvas, can it getImageData? Can it access the CSSOM of stylesheets from file:///A/test.css? Does it get sane error reporting for errors from a script loaded from file:///A/test.js?

I'd hope that each of these would be explained by the origin question above. If we treat file: as having a unique origin, then it seems reasonable that we'd taint the canvas in the first example, block CSSOM access to the stylesheet in the second, and mute errors for the third.

mikewest on 21 Nov 2017

I think you meant "child directory" here? Is that right?

No, I meant what I said, though the antecedents may not have been very clear. The "represents" bit was talking about "a file:// origin", not the URL being loaded. Maybe a clearer phrasing:

When a file:// origin representing file X loads a file:// URL representing file Y, the resulting thing gets the origin of X if the parent directory of X is an ancestor directory of Y.

Or perhaps everyone who needed it has migrated to Firefox?

Or to IE/Edge, yes? On Windows, Chrome is the only browser that doesn't support this use case.

Is this a problem that Firefox would be interested in poking at?

Yes. We've been trying to figure out sane ways to restrict this case without breaking too many users for a while.

Maybe we could try gathering some telemetry about how much breakage users would actually encounter... It's hard to say with some of the corporate-firewall dark matter out there. :(

I'd hope that each of these would be explained by the origin question above.

Right, but the question is what browsers do right now.

Note that Chrome, for example, doesn't enforce CSSOM origin checks the way the spec says it should (see https://bugs.chromium.org/p/chromium/issues/detail?id=650534 and https://bugs.chromium.org/p/chromium/issues/detail?id=775525), which means there are cases that would work in all browsers right now but stop working if Firefox stops inheriting file:// origins into stylesheets but keeps correctly enforcing the CSSOM security checks.

bzbarsky on 21 Nov 2017

Maybe a clearer phrasing

Got it, thanks!

Or to IE/Edge, yes? On Windows, Chrome is the only browser that doesn't support this use case.

A very fair point.

Maybe we could try gathering some telemetry about how much breakage users would actually encounter... It's hard to say with some of the corporate-firewall dark matter out there. :(

Yup. That's a real problem.

What metrics would be helpful to add? I could imagine adding something along the lines of "How many pageviews are on file:?", along with "How many pageviews are on file: and block access to some other file:?". Since Chrome does block those requests, though, I can image it wouldn't be representative of usage in other browsers.

Note that Chrome, for example, doesn't enforce CSSOM origin checks the way the spec says it should (see https://bugs.chromium.org/p/chromium/issues/detail?id=650534 and https://bugs.chromium.org/p/chromium/issues/detail?id=775525), which means there are cases that would work in all browsers right now but stop working if Firefox stops inheriting file:// origins into stylesheets but keeps correctly enforcing the CSSOM security checks.

Thanks for the poke. I've pinged the bug again, let's see if we can get more alignment.

mikewest on 21 Nov 2017

FWIW, we can probably approximate "How many pageviews are file:?" by looking at Chrome's navigation metrics: ~1.98% of "different-page" (e.g. non-fragment, non-pushState) navigations in the last ~month were to file:, which is larger than I'd expected.

mikewest on 21 Nov 2017

As a user, I prefer Firefox's behavior for DOM access/localStorage, because it works better when saving complete Web pages with <iframe>s (like any Tumblr page with a photoset) or localStorage.

As a developer, I often make little utilities for friends and family by giving them an .html file they can use, and localStorage is my go-to for persisting data that way. I don't mind if that storage is unsharable with anything else, but I'd really like to to keep using it, scoped to that particular file or such.

Example: a color picker for a friend who wanted a particular behavior (the colors mixed in some application-specific way) that output a pasteable snippet for the software they were using. It remembers their previous combinations for ease-of-use, since localStorage isn't guaranteed long-term, but I feel that's still a fairly important use-case.

tigt on 12 Dec 2017

I know this issue hasn't seen any activity for a while but if anybody comes back here to talk about local file access, it might be helpful to add a data point about XHR behavior, since Chrome behaves differently between XHR and fetch.

FWIW, I strongly agree with @tigt, that while you may not see huge numbers of people relying on advanced features from a file: context, it would really be a shame to rely on that data to justify restricting what can be done with "fully offline" code. Today, a browser with some clever HTML/JS is a powerful tool for implementing simple applications in a locked-down corporate environment. Since I've been working in such environments for almost 20 years now, I'd hate to see those powers hobbled just because "nobody really uses them".

thw0rted on 2 May 2019

Firefox 68 and newer treat files as unique origins (https://bugzilla.mozilla.org/show_bug.cgi?id=1500453). Maybe it's time to make the spec change.

mozfreddyb on 18 Jul 2019

👍1

Note that when we did this we discovered various places where Chrome does NOT in fact treat different files as different origins. That has been pretty frustrating, with the whole "now we have to reverse-engineer this stuff" business...

bzbarsky on 18 Jul 2019

👀2

Chrome seems to have a single global "file://" origin used by a lot of origin code. There's at least one place in the renderer where a file origin is replaced by an opaque origin, but a lot of code uses origins that were not parsed by that code. So localStorage for Chrome URLs is global. If you enable network state partitioning in Chrome, all file URLs use a single "file" origin partition, etc.

Also, it looks to me like the FireFox and Safari descriptions here are inaccurate, at least with respect to localStorage (which is all I tested). FireFox looks to have a per-directory localStorage. Safari looks to have a global file localStorage (I suspect it just deletes it every so often, but didn't test that).