This issue is lifting a proposal to prevent font fingerprinting that was discussed in PING, but i think go buried in the longer conversation in https://github.com/w3c/csswg-drafts/issues/4055
What if the standard didn't put any limitations on the fonts that could appear in the set of local fonts, but required local fonts to be _specifically, intentionally_ loaded into the browser, instead of defaulting to any and all fonts the browser and find on the OS.  Browsers would then implement chrome / settings / something to allow users to load fonts into the browser (independent of the fonts the user has added to the OS), and only these fonts would be included in the "local fonts" part of the current algorithm.
To use the helpful taxonomy / organization given by @hsivonen in https://github.com/w3c/csswg-drafts/issues/4055#issuecomment-536169515, this would dramatically improve privacy for users in groups ~1, 2, 3~ 2 and 3, moderately improve [1] privacy for users in groups 4, 5, 6 w/o harming their use cases, and preserve what people in group 7 are trying to do. (edit: no change to users in group 1, of course)
I believe this proposal would cut the knot in issue https://github.com/w3c/csswg-drafts/issues/4055 by completely removing the fingerprinting surface for many (most?) users and improve privacy for remaining users (w/o impacting their goals and needs).
[1] I say moderately because it would reduce the number of fonts identifiable by fingerprinters, and so increase the size of these users' anonymity sets.
Thanks for proposing this.
I like that this opt-in behavior doesn't harm the people in group 7, while strongly helping those in groups 1-3.
@r12a ok adding i18n-tracking to this?
Sure. That label is also for you to use to bring our attention to a discussion that you think we should be aware of. So no worries.
@dbaron (for Firefox) @tabatkins (for Chrome) @litherum (for Safari) any thoughts on the desirability and implementability of the proposal by @snyderp ?
The statement in https://github.com/w3c/csswg-drafts/issues/4497#issue-519535119 that:
(edit: no change to users in group 1, of course)
(where group 1 is "Users who never install fonts.") makes me think that an implicit part of the proposal is that browsers are supposed to default to the set of fonts that are installed by default with the OS (rather than defaulting to the empty set as they would if interpreting the proposal literally). That, on its own, isn't trivial -- it requires knowing what that default is across all combinations of OS / OS version / locale / language packs / etc.
If that assumption is correct, and if that somehow takes into account variation across languages in a reasonable way, and it's doable across OSes, then I think it seems like a reasonable suggestion. But I think there are still a bunch of unclear issues related to how to handle users who have support for multiple languages, etc.
I'd be a little hesitant to put it in a spec until it had been demonstrated to be viable in the market, though (i.e., shown that it's shippable without breaking a ton of users).
(On the flip side, it should be entirely allowable for an implementation to do this if it wants. I'd think the spec allows it today, although if that's not the case it should be fixed.)
required local fonts to be specifically, intentionally loaded into the browser
It’s a little unclear to me what exactly this means. What happens if a user doesn’t select any fonts? Does this mean that no text shows up anywhere on the entire web? Presumably “no user action” should be considered the default, and it’s a pretty bad default to break all text on all webpages.
Are you proposing that the installation process of a browser should include font picker UI asking the user which fonts they would like to use? This doesn’t really work for browsers like Safari which are included with the OS, and are not individually installed.
Mitigating fingerprinting in fonts is a good idea, but forcing every user to make decisions they don’t understand or care about before they get to use the product they just bought, would be an unfortunate design.
The CSS Working Group just discussed [css-fonts] limit local fonts to those selected by users in browser settings (or other browser chrome).
The full IRC log of that discussion
<dael> Topic: [css-fonts] limit local fonts to those selected by users in browser settings (or other browser chrome)
<dael> github: https://github.com/w3c/csswg-drafts/issues/4497
<dael> myles: I can't take this
<dael> myles: Not sure I understand the proposal
<dael> Rossen_: And Chris is not on.
<dael> florian: I think TabAtkins will want to push back so we should wait for him
<dael> TabAtkins: I am on now
<dael> TabAtkins: For this topic, no comment right now. I didn't see it earlier. I'll talk to our security people and see what our opinion is.
<dael> Rossen_: That's on Fonts?
<dael> florian: The comment is consistent with Fonts
I'm also concerned this proposal would make fingerprinting _worse_ because users would unintentionally select random different subsets of fonts. At least today, all* people buying a new Mac appear to be in the same bucket. This proposal breaks that.
*for some definition of "all"
@snyderp this topic was on the CSSWG call agenda for this week, but we didn't discuss it because no one on the call was comfortable enough to lead a discussion of this topic. I'd like to invite you to attend the next call where we can discuss this. I'm happy to share the details of the call if you email me at [email protected].
General
It looks like either im using the wrong terminology here, or the terms have changed since I last read the specs. (i'm betting the first). Using the terms as defined in Section 10, the proposal would not affect web fonts or pre-installed fonts. The proposal would only affect the user-installed fonts, and would require users tell the browser which user fonts to make accessible to the browser.
@litherum
What happens if a user doesn’t select any fonts? Does this mean that no text shows up anywhere on the entire web?
If no fonts were selected, than the set of fonts available to websites would be only the pre-installed fonts, and the web fonts. The only thing that user-selection affects is which user-installed fonts sites can access.
…this proposal would make fingerprinting worse
I do not believe this is not the case for the above reason.
I'd like to invite you to attend the next call where we can discuss this…
Sure, that'd be great. I'll email now.
@dbaron
Yes, the proposal would require the browser to distinguish between system-installed fonts and user-installed fonts. I agree its not trivial, but it sure seems like browsers and standards have solved more difficult problems in the past ;) If its really just this practical problem, and then figuring out how to standardize behavior, I am sure the WG and PING could work together to solve the problem.
I'd be a little hesitant to put it in a spec until it had been demonstrated to be viable in the market
and
it should be entirely allowable for an implementation to do this if it wants
It seems like Safari has basically shown this to be viable, since their current shipping strategy is even more restrictive than whats proposed above.
More broadly, the goal of the proposal isn't to give privacy-preserving parties a "this is standards compliant" stamp of approval; its to solve the collective action problem of "how to coordinate many privacy-concerned parties (presumably everyone on this thread) to act in tandem to solve a webscale serious problem, without leaving one vendor / platform to hold the webcompat bag" :)
the proposal would not affect [...] pre-installed fonts. The proposal would only affect the user-installed fonts
Even though it is a conceptually useful distinction when discussing things and can inform some degree of best practices, I am not convinced there is actually a clear enough distinction between user-installed fonts and pre-installed fonts in a way that can be used in normative text. How sharp or fuzzy that boundary is varies per OS.
I suspect that on macOS, iOS, and Android, it's pretty clear. The OSes have one set of built-in fonts (which does vary per version though), and anything else is user-installed.
On Windows 10, the OS has a core set of fonts that are always installed. But there are also various international fonts which are installed by default only if you install windows in a particular language. If you installed Windows in a locale where they're not included by default, they can be added by the user by requesting optional Windows components. And so, is "Yu Mincho" to be considered pre-installed on Windows 10 or not? No, because it's optional and some users lack it? Yes, because they're components of Windows, not third party software, and is share by all Japanese installs, and is needed to display Japanese properly? Yes on Japanese windows but no on English Windows? How about on my English language account in a Japanese version of Windows, or the other way around?
Also, many/most Windows computers are sold come with some version of MS Office, and the user took no step in putting it on the device. Are fonts bundled with Office pre-installed or user-installed?
And then you have Linux, where it completely falls apart:
All in all, I feel that this something that some user agents on their own initiative could do on some platforms, based on their own definition of what should be auto-exposed and what should be opt-in, but I don't think this is something we can specify or mandate with any degree of interoperability.
I suspect that on macOS, iOS, and Android, it's pretty clear. The OSes have one set of built-in fonts (which does vary per version though), and anything else is user-installed.
Even on macOS, I'm not sure this is quite so clear. There are a number of fonts that are shipped with macOS but are not exposed in the standard collection seen when applications request "the list of available font families" from the OS. Some of these are associated with particular applications (iWork, iLife), and can probably be ignored here, but there are also a bunch of fonts for "language support" (basically a collection of Noto fonts); should these be auto-exposed? (FWIW, Firefox currently does make the "language support" fonts available in the browser, even though they don't appear in other applications' font lists.)
And then there are downloadable fonts: there are a number of font families (mostly, though not exclusively, CJK fonts) that are not installed by default (at least on my US English configuration), but are known to the OS and can be "enabled" (downloaded) directly within Font Book, without following the usual font installation route. Are these considered "system" or "user"-installed?
On Android, it's also less than clear, I think: in my experience, many device manufacturers or distributors customize the collection of fonts they ship, so that there is not a uniform set of fonts per Android version (unless we interpret "version" here to encompass not only the Android version number but also the specific device on which it's running).
I'm also concerned this proposal would make fingerprinting worse because users would unintentionally select random different subsets of fonts.
Yeah, having users choose a set of fonts makes no sense for the purpose of avoiding fingerprinting.
I suspect that on macOS, iOS, and Android, it's pretty clear.
Sadly, as @jfkthame pointer out, it's not that clear on macOS. However, having the browser block the optionally-downloaded macOS-bundled fonts is likely to be feasible in terms of the resulting user experience. That is, blocking the optional fonts of macOS is probably less likely to result in complaints from users than blocking the Windows 10 and Ubuntu fonts that aren't part of the base set but get installed as a side effect of certain languages.
browsers would have to maintain their own per distribution list, which isn't scalable.
It's not scalable, but it could work for most users to cover Fedora, Ubuntu, and potentially openSUSE (I haven't looked at the openSUSE font situation). Even though Debian is a major distro, the Debian approach of leaving so much of the configuration to the user makes it infeasible for the browser to try to normalize the configuration as it's visible to the Web.
AFAICT, the situation with Fedora is clearer than with Mac: There one pretty good set of fonts: enough to cover the stylistic needs of what the Web uses generically without offering too much to take too much disk space. Ubuntu has the same problem as Windows 10, though: The base set is stylistically a little bit too narrow for some major languages, and "Adding a language" installs fonts such that the set of added languages serves as a fingerprint.
On Android, it's also less than clear, I think: in my experience, many device manufacturers or distributors customize the collection of fonts they ship, so that there is not a uniform set of fonts per Android version
Do any remove any fonts from the set that one would get on a Pixel or Android One phone?
What about - for the sake of privacy - getting vendors, aka Mozilla, Google, Apple, Microsoft, Opera, together and just ship a standard set of fonts where every vendor can 'donate' some fonts so they are available cross plattform as well. These fonts would come from a standard repository and be always the same, even on binary level.
Edit: License wise these fonts would go with the browser, not being installed into the OS.
Each vendor 'donating' a font would be responsible for licensing to be compatible for this use case. Each vendor 'donating' a font would enable them to ship their corporate identity much easier say shipping Droid Font group, San Franciso Font group etc. So there'd be some benefits.
In addition to this users could opt-in in their control settings to allow add further locally installed fronts like suggested here.
What about - for the sake of privacy - getting vendors, aka Mozilla, Google, Apple, Microsoft, Opera, together and just ship a standard set of fonts where every vendor can 'donate' some fonts so they are available cross plattform as well.
The disk space footprint of such a set is relatively large: 550 MB unhinted and uncompressed. Unless you can convince Microsoft to adopt an Apple-like font-rendering aesthetic (which would be awesome achievement regardless of privacy), you need the fonts to be hinted in order for them not to look awful on Windows.
That kind of data size is problematic for browsers that aren't bundled with the OS. As for browsers that _are_ bundled with the OS, it would be equivalent to getting the OS vendor to accept that kind of disk footprint for the default install, since the OS-bundled browser is part of the default install. If you could get Microsoft and Canonical to install a Mac/Fedora/Android/Chrome OS-like set by default, the font set wouldn't need to be for the Web but could be the OS default font set.
(There are so many ways to fingerprint the OS that trying to standardize the set across OSs isn't a particular privacy benefit. However, getting Microsoft and Canonical to install more fonts, even if mutually different, by default would be.)
The proposal would only affect the user-installed fonts
Aha! This makes more sense now. Sorry for misunderstanding.
What about - for the sake of privacy - getting vendors, aka Mozilla, Google, Apple, Microsoft, Opera, together and just ship a standard set of fonts
Putting aside licensing concerns, I wish we could do this, but alas, one of the goals Apple has is to minimize the size of the image that users have to download in order to upgrade their OS. Fonts have been a target for minimization of this image. Indeed, we avoid shipping some large fonts with the OS image to users which are unlikely to ever need those fonts. Realistically, we wouldn't increase the size of our OS image because some other browser, unrelated to us, wanted to include some font. Any such font would, by definition, be a font which we had previously judged as unnecessary for the OS, and which we had already made a decision to omit.
it's not that clear on macOS
Indeed; @frivoal is incorrect in his assumption that all users who buy a particular version of macOS get the same set of preinstalled fonts. As mentioned in the previous paragraph, we will customize the set of preinstalled fonts depending on a user's locale. However, the set of possible buckets a user could be in is quite small - way, way smaller than the amount of entropy that would be exposed to the web if we had not implemented Safari's behavior.
I wrote a "font taxonomy" in fonts level 4, which defined a user-installed font as:
installed by an explicit action by the user, such as clicking an "Install" button or copying a file into a particular directory on their device.
I believe this definition offers one unambiguous way to discern between user-installed fonts or not. (I'm not trying to say that this definition is the best possible definition to be used for the purpose of this GitHub issue; I'm just trying to make an existence proof that it is possible to come up with an unambiguous definition which could be used here.)
I should also make the point that, in order to implement Safari's behavior, Safari has to use non-public SPI to determine whether or not a font is user-installed or not. If other browsers are interested in using this mechanism, we can start the process of exposing this functionality publicly.
I wonder if this conversation (and the proposal) isn't letting the perfect be the enemy of the good. If the proposal was amended like the following, would that address some of the "there isn't such a clear line between default and system installed fonts" concerns?
- Build lists of default fonts shipping on windows, Mac and popular linux OSs.
- define OS-installed-fonts fonts as the intersection of the above set, and fonts on the user's machine
Indeed, this would be the most realistic way forward if for step 1 you take the _intersection_ of sets that can be installed by default depending on the install-time language on macOS and on Windows 10 and Ubuntu you take the _union_ of such sets. (On macOS, the intersection is already good but the user can install the rest potentially one-by-one, so the union would leave a lot of fingerprinting surface as long as the mechanism mentioned two comments up is private. On Windows 10, the intersection isn't good enough, but the fonts not in the intersection get installed in bundles, which still leaves a large number of combinations, but not as large as installing fonts one-by-one. On Ubuntu, the user could go install fonts that are in the union but not in the intersection one-by-one, but the intersection isn't good enough, so what can you do.)
To improve privacy compared to that on Windows 10 and Ubuntu would involve those systems making their intersection set more Mac/Fedora-like, which is not something that browsers can do for them, but would then allow switching to the intersection instead of the union.
Sorry, i phrased that poorly; By:
define OS-installed-fonts fonts as the intersection of the above set, and fonts on the user's machine
I meant the fonts from #1 ("the above set") ⋂ the fonts on the user's machine, but managed to write it in Polish-notation ¯\_(ツ)_/¯.  I think @hsivonen managed to (heroically) tease out what was intended, but wanted to restate to make clear :)
The CSS Working Group just discussed [css-fonts] limit local fonts to those selected by users in browser settings (or other browser chrome).
The full IRC log of that discussion
<dael> Topic: [css-fonts] limit local fonts to those selected by users in browser settings (or other browser chrome)
<dael> github: https://github.com/w3c/csswg-drafts/issues/4497
<pes> (howdy :) )
<dael> pes: Right now font fingerprinting by most measurements is highest entropy. Would like to address and solve
<dael> pes: Needs standards level b/c will require some degree of PR or author notification or browser changes so everyone moves in tandem.
<dael> pes: Couple proposals but in general need to limit types of fonts websites can tickle are those that are useful for users
<dael> pes: Most recent prop is let's figure out default font OSs, union those, websites can access those. If users want to opt in other fonts you should be able to do those with a prompt from the browser
<dael> astearns: One things about privacy discussions and opting in to exposing information is the concern about messaging to the user what they are doing. Is there possible a way to opt in to that giving a good story to the user what they're doing?
<myles> q+
<dael> pes: Two thoughts; if users are doing this they already know somewhat what they want. Convaying some of the functionality without the meaning is easy. If there's the desire to convey why I think that's easy to describe. I could put up text. People usually follow defaults. From examples in GH is moderatly expert users and that's something two steps out of the mean
<dael> pes: I think it falls nicely where browsers are doing things users don't expect and taking it from default path is a win and allowing users intentionally installing fonts is doable
<astearns> ack myles
<dael> myles: A couple points. 1) Fingerprinting based on set of available fonts is real bad and we philosophically should try to solve.
<jensimmons> q+
<dael> myles: 2) there's a problem here issue is trying to address which isn't stated yet. Safari already disallows user installed fonts so we're similar to prop but don't ask user to allow. There's a problem with that where for some lesser used scripts there may be no system font that can support so can have unreadable pages
<pes> q+
<dael> myles: This proposal has affordances to solve that where for those scripts users can opt in. That's good. But there's a cost which is throwing additional prompts to user they may not understand and adding friction at OS install time is something the entire company tries to minimize
<dael> myles: Realistically us adding a screen at OS install time is difficult to do and generally not something we're comportable with
<dael> astearns: Clearification on scripts not supported- is Safari not rendering some web pages it might otherwise be able to with the script?
<dael> myles: Yes. Logic is it's a better situration for them to have to use web fonts b/c it's better to require that then for every user to install a font with a name b/c less websites then web users
<chris> q?
<astearns> ack jensimmons
<dael> jensimmons: I was never aware Safari doesn't allow user installed fonts. And I've never heard a web diesgner talk about that. I agree it's complicated for users to understand a browser setting. That's the kind of thing webdev can't count on. It may be something to think about but doesn't seem core to solving
<chris> q+ to say have seen exactly those notices on web pages
<myles> q+ to ask browser developers on macOS if they want OS-level support to allow for the behavior that Safari has about user-installed fonts
<florian> q+
<dael> jensimmons: Looking at last part of GH issues I see intersection of fonts vs union. Union is when you have all the fonts on both even if they're only on some OS. Some people are saying to do intersection where Arial is on so it's allowed but Aveneer is only on MAc so no allowed. Intersection would be a massive problem. I don't think it's a problem to limit to what's shipped in OSs
<florian> q-
<florian> q+
<astearns> ack pes
<dael> jensimmons: If we try to limit to intersection it will break millions of websites. I don't think people are counting on some people might have fancypants font. But they are counting on some people have Aventeer but others Roboto
<dael> pes: Clarifying the proposal: Uniion of all fonts shipped by default by all OSs that an be resonably compliled. And the ones fed to the website are intersection of installed and system fonts. That is one option on Android, another on OSX, etc.
<dael> pes: Proposal isn't to say at install time let these fonts be accessed. Proposal is for small set of users who expect these fonts to be available b/c region of website allow user to go into advanced and have a drop down of additional fonts. Expectation is relatively few people do this, but communities that needs this alreayd install extra font so this additional step isn't infeasible
<astearns> ack chris
<Zakim> chris, you wanted to say have seen exactly those notices on web pages
<dael> chris: I wanted to say contrary to jensimmons not seeing it done, I have seen this on some sites, particuarly when Indian was worse. South Indian it's common to install locally used fonts. It could be there's a pack that installs a bunch of fonts. I don't want to break web experience for those that have been using it successfully
<jensimmons> +1
<dael> astearns: Is there a way to survey scripts and say these aren't by default but everybody in that region installs these fonts
<dael> chris: It would be a valuable addition
<astearns> ack myles
<Zakim> myles, you wanted to ask browser developers on macOS if they want OS-level support to allow for the behavior that Safari has about user-installed fonts
<dael> myles: one other small things. Mechanically ability to discern between fonts is not a public API on OS. I would love it if a browser programmer wanted this exposed; I'd love to know that
<astearns> ack florian
<myles> s/I would love it if a browser programmer wanted this exposed/I would love to hear if any browser programmers wanted this exposed/
<pes> https://github.com/Valve/fingerprintjs2/blob/master/fingerprint2.js#L557 <— example of fingerprinting via fonts
<dael> florian: I don't see TabAtkins on and I'd like to bring up a point from him. this seems less drastic then others discussed so down side not that bad. If we do the intersection of what's installed it has a fair bit of variability. Besides font fingerprint there are other means. The question to ask is does it actually help. If we don't reduce it enough then we haven't achieved anything even though we decreased it
<dael> florian: Downside isn't that bad if we include the language specific common fonts, but there still is some cost. Are we paying it for something or do we not reduce your uniqueness enough
<jensimmons> There are many other ways of fingerprinting, but many of us out here are knocking them out one by one. I believe we should also do our best to close such security flaws. (in response to florian's comment)
<pes> +1000 for what is being said right now.
<jensimmons> +1 from me as well
<dael> plinss: Let's not let hte perfect be the enemy of the good. There's a lot of fingerprinting surface and we need to make small steps in every regard. We have to take small steps where we can and get a cumulative effect where we can
<dael> florian: TabAtkins point is he doesn't think we can ever get there
<astearns> yep, if it makes fingerprinting at all harder, it's progress
<dael> plinss: If we never try we won't get there
<astearns> ack fantasai
<Zakim> fantasai, you wanted to ask about impact on linguistic minority populations
<dael> myles: I don't doubt we can get there so we have one vote on either side so worth discussing
<astearns> ack dbaron
<Zakim> dbaron, you wanted to talk about both exposing Mac OS API and about being willing to expose OS differences rather than intersection or union
<astearns> q+ fantasai
<dael> dbaron: Two comments. myles asked about the API on Mac. it's not something we planned to work on but if we do get to work on this it would be useful. Lack might cause us to have to do a work around. If it's exposed it would make this sort of thing more practical. But we don't have a concrete plan to use it
<pes> +q
<dael> dbaron: Other thing is that there's been a bunch of discussion about intersection and union of fonts across OSs. I'm not convinced we want either. There's an argument that we want to allow vary between OSs. There's a set of fingerpritnable information we can hope to obscure but there's bits of entropy we can be okay giving up on and one of those is if a user is on windows or mac.
<myles> dbaron++
<dael> dbaron: Either way of addressing it makes the solution worse. Intersection limits designers, Union is a bunch of fingerprinting exposed if users install fonts that are default on a different OS.
<astearns> ack pes
<dael> pes: Point about fingerprinting nihilism. Ping and those in privacy community are trying to address it in many ways. You chip away as you can. Nature of problem means different wins benefit different people are different times.
<myles> pes++
<dael> pes: Chpping off entropy bits is valuable. Different fingerprint endpoints yeild differently as well. So adding noise is an option in some places, but fonts is not one of those places. Figuring out how to reduce problem to a subset seems valuable here. I think every time we take a slice of the problem we're benefiting some people. We're not throwing coins in a well
<astearns> ack fantasai
<plinss> +1 to pes
<dael> fantasai: Two points. 1) If we're going to do this we should document which fonts re allowed on which OSs so all browsers can align with interop. If we're going to do this we should start a registry of allowed fonts so authors and browser vendors can figure it out
<dael> astearns: To that I think yes and no. There should be a rule for what's in and maintian it, but have the list generated from rules. If we do a union the set will only ever grow
<dael> fantasai: Rule in fonts spec, document of fonts that conform to the rule. In an OS API is nice, but and author won't find it easily
<dael> astearns: We aren't saying browsers restricted to this for all time, we're saying here's the set we're aware of and here's the rule to add a new font
<dael> myles: There's a bunch of websites that list the fonts. Do we need to maintain if it's out there?
<dael> AmeliaBR: Not in reliable up to date fashion
<dael> myles: Will ours be?
<dael> dbaron: Need something that reflects worldwide usage. Some of that will need to be us.
<dael> fantasai: I don't want it in fonts spec. A note or a W3C regitry that's easily maintained and doesn't need our intervention.
<dael> fantasai: 2) I want to emphasize we need to address impact on minority language population and limiting to default fonts won't cut it. Need to not harm those communities. You can't use web fonts for things like this. Places with minority language are also where downloads are slower and more costly. need to make sure we address that head on
<pes> +q
<dael> astearns: Other points on this topic?
<astearns> ack pes
<jensimmons> +1 to everything fantasai just said. It would be incredibly helpful to Authors to have such a list.
<dael> pes: I'd like to know how Ping or I personally can be most useful to make sure a solution gets into the next part of the spec and get this solved
<pes> i can promise to do that :)
<dael> astearns: I think being active on GH issues that are open and reviewing spec text is most helpful. Anyone else can chime in?
<dael> fantasai: Are we resolving to do this and if yes exactly what. If not, when do we follow up?
<dbaron> I think we're a decent distance away from converging on a particular thing to add to the spec.
<dael> astearns: I was thinking to draft a solution based on GH thread. I'm not hearing objections to trying to solve this. Coming up with something to put into Fonts spec to limit fonts available to web pages
<pes> +q
<dael> AmeliaBR: Have text saying browsers may limit what fonts are exposed. Is the agreement at this point to increase the normative standard of that or be more specific abou which you might want to limit or be specific about which fonts are safe?
<dael> AmeliaBR: Have it defined tht if a font meets these criterias the browser should make it available and may block access to all others
<pes> -q
<dael> astearns: It would be yes on your first two. Increase to a must requirements and more concretely define the restriction. But we're not trying to come up with list of web safe fonts, we're defining what browser should make available in terms of locally installed fonts.
<dael> astearns: Registry we have won't have all safe, but might be available
<dael> AmeliaBR: Using safe from privacy PoV not author guarantee
<dael> astearns: Yeah. Like dbaron said in IRC we don't have a thing for the spec yet. Just an intent to solve the problem
<dael> myles: One point, I don't think can make a must b/c involves non-CSS pieces of browser.
<fantasai> +1 to myles, this would have to be a SHOULD
<pes> +q
<dael> astearns: Must would be font matching algo. Not about browser that might make it less restricitve
<gsnedders> do some OSes still conditionally install fonts based on locale? what should the behaviour be then?
<dael> fantasai: Should still be a should. There's CSS UAs that don't want to limited in this way. Like a PDF renderer which is trying to print local documents. This will have to be a should.
<dael> fantasai: Browsers will probably follow b/c it makes sense to them
<astearns> ack pes
<dael> chris: Agree it should be a should for that reason. They might have to opt in but we shouldn't block it.
<dael> pes: In general a should doesn't mean too much when doing privacy review. We're looking to see if a browser properly implements will they be protected. Point taken where there are places where privacy doesn't apply and those should be carefully detailed out.
<dael> pes: The case discussed where needs to be a way to opt in is handled in GH issue
<fantasai> RFC2119: 3. SHOULD   This word, or the adjective "RECOMMENDED", mean that there
<fantasai>    may exist valid reasons in particular circumstances to ignore a
<fantasai>    particular item, but the full implications must be understood and
<fantasai>    carefully weighed before choosing a different course.
<dael> pes: I've not written specs but I know in many places functionality is described that hooks into browser chrome so might consider examples like that
<dael> fantasai: I want to point out a couple things. One is we as a WG don't know all UAs that exist or will exist and we shouldn't make a UA non-conformant just b/c satisifes a non-browser use case.
<dael> fantasai: Technical definition of should is [reads]
<dael> fantasai: I think that's appropriate in this case
<chris> +1 to the RFC2119 definition
<dael> astearns: I think we should go back to GH and hammer out exact proposal and level of requirements. I think there's quite a bit of work before there's something to put in spec, but we should get to that. Maybe checkpoint in a month
<dbaron> "a month" is the face-to-face, btw
<dael> AmeliaBR: Sample spec text drafts would be helpful so we can start comparing
<dael> astearns: dbaron reminds me we have the F2F to hammer out the last details and get it into spec
<dael> chris: Sounds good way forward.
<pes> (terrific :) )
<dael> chris: pes you should keep watching issue and we update EDs daily so you can see evolution of text
<dael> astearns: Anything else on this?
<dael> astearns: Thanks everyone
It was mentioned in the minutes above that some UAs (like print formatters) will likely want to continue to match all installed fonts. I expect non-browser applications built on things like Electron will want to be able to act like a native app and build a full font list as well.
Uniion of all fonts shipped by default by all OSs that an be resonably compliled.
I think that's not a reasonable fingerprinting protection. That way, you could fingerprint a Mac that has Microsoft apps that install fonts from the Windows set.
In my previous comment about intersections and unions, I meant that the _Mac-specific_ allow-list would be the intersection of the macOS locale-specific configurations and the _Windows 10-specific_ allow-list would the an union of the Windows 10 locale-specific configurations. (This assumes that you can fingerprint macOS vs. Windows 10 regardless of fonts anyway.)
I want to emphasize we need to address impact on minority language population and limiting to default fonts won't cut it. Need to not harm those communities. You can't use web fonts for things like this. Places with minority language are also where downloads are slower and more costly. need to make sure we address that head on
This indeed needs a solution. Sadly, logically you can't have both unconditional font fingerprinting protection and support for scripts whose fonts aren't covered by the OS bundled set (either not covered at all or not covered to the satisfaction of the users).
Is there data about what those scripts are these days?
South Indian it's common to install locally used fonts.
Is there up-to-date data about this? (I'm aware of pre-iOS/Android-era windows-1252 fonts with Tamil glyphs was common, but I have been unable to get even Twitter anecdata about whether that's still a relevant issue for sites browsed only from desktop of if the difficulty of installing fonts on phones has forced sites to adapt not to being able to expect users to install such fonts.)
I think that's not a reasonable fingerprinting protection. That way, you could fingerprint a Mac that has Microsoft apps that install fonts from the Windows set.
Sure, if the group is game to split these groups up by OS, that'd be even better. I made my suggestion because it'd be 1) a significant improvement over the status quo, even if far from ideal, and 2) it seemed the group was hesitant to break things up by OS. But I 100% agree, making these lists per-OS would be even better.
Is there data about what those scripts are these days?
Is there up-to-date data about this?
My understanding from the call was that the current idea was:
My understanding from the call was that the current idea was:
- Put the algo for deciding what fonts are often loaded in which locals into the spec
- Crawl or telemetry or similar measurements for which fonts the above algo gives
What would the crawl or telemetry measure specifically so that existing font fingerprinting scripts don't confuse the measurement?
The Windows 10 font bundles are documented at https://docs.microsoft.com/en-us/typography/fonts/windows_10_font_list . (Note that the "Pan-European Supplemental Fonts" bundle is not autoinstalled as a side effect of adding an applicable language to Windows. The rest of the bundles are autoinstalled when an applicable language is added to Windows.)
The fonts bundled with macOS Catalina are documented at https://support.apple.com/en-us/HT210192 . You want the first section and the parts of the last section that are in the hidden language support folder (practically the Noto fonts in the last section). (The last section also lists iWork fonts.)
You can find the default font list for a given Linux distro by running fc-list --verbose after doing a clean install. Fedora 31 beta, Ubuntu 19.10 beta English, Ubuntu 19.10 English with all languages added after install.
The base set for Windows 10 is 343 MB and the full set appears to be "Size: 3.73 GB, Size on disk: 3.39 GB".
What would the crawl or telemetry measure specifically so that existing font fingerprinting scripts don't confuse the measurement?
It seems like we could distinguish "real font use" from fingerprinting, by either (easy mode) discarding from the crawl any site requesting a large number of fonts, or (slightly less easy) only looking for fonts that are applied to the document in a way that exceeds some threshold (e.g. X amount of text that persists for Y amount of time). Seems like a solve-able problem :)
Hi all, hope everyone is having a good new year. Was very happy and encouraged by the call we had last month. Wanted to get a sense of where things stand after that call, how i can help move things forward, and if there is another call for this month I could attend to continue progress on this.
@snyderp we have a face-to-face meeting next week Wed-Fri. Would you be available on any of those days to call in to a 10:00-19:00 Central European Time meeting?
@astearns sure, that would be no problem. I'm be in London time next week anyway, so availability is pretty flexible during the times you mentioned
If I understand things correctly this proposal (1) prevents finger printing for users that go with default browser settings which would only provide access to default system fonts and (2) provides a way for users to give access to additional locally installed fonts they might need for their use cases inside the browser. When going for (2), users consciously drop the safety provided by (1) and become exposed to fingerprinting.
Given the fact that most of the protection provided by the default browser settings are lost when users decide to give access to additional fonts, I was wondering if that couldn't be achieved through a mechanism that is less tedious for users than manually picking the fonts they might be needing in the browser. For example, what if local font access could be granted per application (domain) and applications could explicitly require that permission through a builtin browser dialog. The dialog would explain to the users the risk they are exposing to. IMHO, the fact that users grant access to local fonts only to the applications they trust, brings more safety than adding access to a subset of fonts which any application can access.
The reason I mentioned a mechanism through which users could grant access to all their local fonts to specific applications rather than manually selecting the fonts they want access to is because there are applications that might need this functionality. For example applications that provide advanced text styling, like word processors or design applications.
The CSS Working Group just discussed limit local fonts.
The full IRC log of that discussion
<astearns> topic: limit local fonts
<astearns> github: https://github.com/w3c/csswg-drafts/issues/4497
<faceless> pete: is taling about font fingerprinting by identifying the computer based on which fonts are installed on the computer
<faceless> pete on suggestion is to list a document which describes a list of specific onts
<faceless> s/onts/fonts
<faceless> myles asked about what pete wanted to discuss that wsn't on a previous call
<astearns> (from the last discussion: astearns: I think we should go back to GH and hammer out exact proposal and level of requirements. I think there's quite a bit of work before there's something to put in spec, but we should get to that. Maybe checkpoint in a month)
<faceless> dbaron one of the questions was to what extent this would be allowed vs recommended vs mandatory. is comfortable with recommended not sure about mandatory partly because we don't know exactly what we're trying to do. open questions.
<faceless> dbaron thinks we should allow this
<faceless> myles at already does
<faceless> dbaron ... to recommednd this, and work on addding detail to the recommendation. when we're comfortable with the level of detail there, we can mandate this, but there are lots of open questions
<faceless> dbaron eg. effects on minorities etc.
<fantasai> s/minorities/linguistic minorities, across OSes,/
<faceless> myles if we don't make mandatory but do make recommended, would be good to hear from all present if we should change behaviour
<faceless> pete webkit is even safer than this, webkit won't load some fonts off disk
<Rossen__> q?
<chris> q?
<faceless> dbaron maybe jonathon can speak more authoritively on this but thinks maybe this might be more difficult to do on some platforms than others
<Rossen__> ack faceless
<Rossen__> ack fantasai
<Zakim> fantasai, you wanted to mention CSS2PDF renderers
<faceless> fantasai wanted to say there are classes of user agents where this makes no sense. eg css-pdf renderers, which need to access all fonts on the system
<fantasai> chrisL: localhost could have access to all of thm
<faceless> svgeesus css to pdf renderers have the ability to opt in to lists of fonts per site, which makes it more possible to opt out
<faceless> (as in opt out of fonts per domain)
<Rossen__> s/dbaron maybe/dbaron: maybe/
<pes> +q
<chris> although I wasn't just speaking of css to pdf renderers
<faceless> myles: as an engineer I am always thinking about how we can test this, but if there's going to be no changes to the file system this will be untestable
<dbaron> s/thinks maybe this might/we support a larger number of platforms and on some of those this might/
<Rossen__> ack pes
<faceless> pete: initial proposal was all this should be dealt with by the browser opting in
<faceless> pete: if the takeway is that the idea is useful but nothing is required at this point, I don't think that's any change from the status quo
<faceless> fantasai: a should requirement is not a no-op
<faceless> fantasi: it recommends action and it may be appropriate in this case.
<faceless> myles: but if no-one acts on that recommendation what's the point of it?
<faceless> fantasai: users agents don't always act on hard recommendations either.
<Rossen__> q?
<TabAtkins> If necessary I can state this at some point, but I believe Chrome's position is that we extremely want to stop fingerprinting as an identification vector, but I don't think that designing a solution in a committee with this skill set is appropriate. (There are groups in W3C (or elsewhere?) that are more appropriate and contain people with the right set of skills.)
<chris> 3. SHOULD   This word, or the adjective "RECOMMENDED", mean that there    may exist valid reasons in particular circumstances to ignore a    particular item, but the full implications must be understood and    carefully weighed before choosing a different course.
<fantasai> https://tools.ietf.org/html/rfc2119
<faceless> fantasai: should means if you have a good reason not to do it, you don't have to do it. But you need a good reason
<chris> https://tools.ietf.org/html/rfc2119
<faceless> myles: to paraphrase fantasai : we can put a should and say all browsers should do this, or we can make a partition and say some browsers should, some shouldn't
<faceless> myeles: pete said first option was a thumbs down
<faceless> pete: if there's an option that is available to the user-agent to do this that's ok
<faceless> fantasai: I would imagine you would object to this being turned on by default. but cs-spdf renderers would have to turn this on by default
<faceless> pete: doesn't have to be that way. if you're sefving documents off disk, for example, it could be off
<faceless> svgeesus: could you explain the harm of the status quo where someone on their own disk converts a file locally
<faceless> pete: that's not what I mean
<Rossen__> q?
<faceless> floriank: so if we don't mean everyone has to do this, then lets not say everyone
<pes> +q
<Rossen__> ack pes
<Rossen__> s/say some browsers should, some shouldn't/say some browsers must, some must not/
<faceless> pete: its seems like this must not be a new idea - there are cases where apps using hTML renderers have one set of rules, browsers have others
<faceless> heycam: we do
<heycam> different conformance classes for selectors (fast profile)
<faceless> florian: should means you have to do it unless there is a good reason, but good reasons do exist and if you have them you won't be arrested for not doing it
<pes> (Im very sorry to be tedious, but could people identify themselves when speaking for a bit?)
<pes> +q
<florian> s/for not doing it/for not doing it, but neither would you if we wrote must/
<fantasai> s/if you have them you won't be/if you have them you won't be non-conformant. You won't be/
<faceless> pete: who's planning on doing what?
<fantasai> s/for not doing it/for not doing it under SHOULD, but neither would you be under MUST/
<florian> q+
<faceless> pete: it's pretty important we get this sorted out so we can get the cross-browser expectations to users
<pes> -q
<pes> q+
<faceless> tab: especially gven recent info. fingerprinting in side channels is a very tricky thing to do and we don't have the expertises in this committee so i'd object to putting a "must" on this as i don't think we have the ability to do it ourselves
<faceless> tab: while it's very important and needs doing, I don't want to put anything binding on this committee
<Rossen__> ack florian
<hober> q+
<astearns> and a 'should' allows all non-Chrome browsers to do the thing and eventually make it more likely for them to bend
<tantek> note that "print formatters" are also "normal browsers" in Print Preview mode
<Rossen__> ack pes
<faceless> florian: we could put a note on this clarifying the intent to explain why a should recommendation is there and who shuodl follwo it
<florian> +1 to tantek
<faceless> pete: surely we do have the expertises
<faceless> tab: lI'm talking specifically about the CSSWG - the goal of reducing fingerprinting is 100% our goal, but Chrome doesn't want to bind themselves to a MUST resolution
<faceless> tpete: if you aren't hte people to ask, who is?
<faceless> s/tpete/pete/
<tantek> we really need to capture as much as we can in the issue, and then reach out more broadly than the WG
<faceless> tab: we have engineers who are working on this and hav ethe expertise on this, but none of this in this group have the expertise
<Rossen__> ack hober
<tantek> sounds like this discussion is going in circles
<dbaron> q+
<faceless> hober: you have co-chairs of ping and the privacy cg in this room, and pete is not coming to us as an individual - this is a concern from a number of people in this area. as a member consortium it's the responsibilty of this group that we have people who can speak on these issues. so it's disheartening to hear you don't want to consider this because we don't have the expertise. that's our role
<faceless> tab: yes I understand but this is the only privacy issues on this point, it's not approriate to invite the security team to be here
<pes> q+
<faceless> tab: i'm on the write person, none of here are.
<TabAtkins> s/on the write/not the right/
<faceless> rossen: calls order
<Rossen__> ack dbaron
<tantek> LOL: one-line S&P section in css-fonts 4: "The system-ui keyword exposes the operating system’s default system UI font to fingerprinting mechanisms."
<TabAtkins> I think the PING/etc are the right venues for this discussion, not the CSSWG.
<myles> did I write that?
<hober> TabAtkins: PING came to us with this!
<myles> s/did I write that?//
<tantek> presumably we are talking about more than just system fonts
<faceless> dbaron: pete asked who are the right people. i think sort of a weird question, given the response we're trying for. I think we are the right people, but the misunderstanding that leads pete to ask this question is that it's not a short process
<TabAtkins> This needs to be "privacy teams, with a font-related engineer on call", not "a bunch of layout/etc engineers, with a privacy engineer on call"
<pes> (i cannot hear anyone speaking…)
<faceless> dbaron: we're trying to make a substantial change to the way this works on the web platform. It's a process that requires proposal, iteration, requirements
<faceless> dbaron: (is more emphatic)
<faceless> dbaron: we're trying to do this thing that requires iteration and refinement of a proposal, and what we're saying is "yes, we're accepting that this is the next stage of the process and it's woth pursuing"
<TabAtkins> hober, Sure, and I'm saying that looking to this group for binding resolutions on this topic isn't appropriate. We own a spec with a feature that will be impacted; that doesn't mean we should be designing the change, just ensuring that it's integrated and well-explained when it's finished.
<faceless> dbaron: but pete is saying that's not the right thing - we need to have a solution now. But we haven't had the conversation that we need to have first. So we're basically saying yes to it, but we have to begin the process
<faceless> dbaron: I think that disconnect is why we're stuck
<dbaron> q+
<faceless> pete: with respect this was filed in june. there's been on counter-proposal since then
<faceless> pete: this is the #1 privacy issues on the web
<faceless> rossen: we understand and we recognise the urgency but the reality is there is a backlog
<Rossen__> q?
<faceless> rossen: the fact it was filed a while back does't mean it's not important to us
<pes> s/there's been on counter-proposal/has not been a counter-proposal/g
<TabAtkins> See, for example, how we were just spitballing about how to design a font list and how to segregate it. We don't have the expertise to do that; we can't get "close enough". It has to be done right, and we're not the group to do that.
<faceless> s/pete/pes
<Rossen__> ack pes
<faceless> pes: i want to know what the next steps are. If there's a process, what is it, what is the timeline?
<faceless> rossen: one of the proposals is to resolve with accepting this as a SHOULD statement.
<faceless> alan: the spec has this currently as a MAY?
<dbaron> q-
<faceless> myles: yes, what pete is aiming for is different
<florian> q?
<chris> the current "MYA" has a lot less detail
<florian> q+
<chris> s/MYA/MAY
<faceless> rossen: can we take the resolution now that changing the current definition to a SHOULD and live with that?
<faceless> myles: not unless someone can state what the SHOULD should say.
<tantek> agreed I want to see the full statement here in the minutes
<faceless> florian: agrees with myles
<tantek> +1 myles
<faceless> florian: you asked about next steps, the relevant user agents will attempt to do it once the SHOULD has been framed properly
<faceless> florian: after that, one the user-agents implement, we'll get feedback and see what to do then
<pes> q+
<faceless> florian: maybe we will find a line to draw to mkae a distinction, i.e. user agents loading from the file system. but we don't have that information onw
<faceless> s/onw/now
<Rossen__> q?
<faceless> svgeesus: pete if you're happy to make a first draft of the SHOULD recommendation I'm very happy to work with you on this
<faceless> pes: happy to, but is there a rough timeline, and also the current proposal points to a list maintained elsewhere. Is that the way we want to keep things?
<faceless> rossen: ok first issue. Do we ant to stick with a list that is maintained elsewhere
<faceless> dbaron: a list of what?
<TabAtkins> TabAtkins: local fonts that are allowed
<faceless> florian: the current spec is a list of things which are ok, - fonts
<heycam> +1 don't think where the list lives is the first thing to worry about here
<pes> q+
<faceless> floain: i think we should write the list down, put it wherever, once we have figured it out we can worry about where to put it later
<faceless> johnq: it's not clear where this this group maintaining a list is the right approach or whether we should look into platform APIs exist to determine which fonts are platform installed vs user installed.
<dbaron> s/johnq/jkew/
<faceless> jkew: it seems like maintaining a list is a never-ending nightmare. maybe OS vendors should maintain the list? I'm not sure it's realiistic that we maintain it.
<florian> q-
<myles> q+
<faceless> florian: no macOS API will give you that list. We should start with a list and once we've tried it out, we may find it's not the best option
<chris> q?
<faceless> rossen: lets try  to find something actionable
<florian> s/once we've tried it out, , we may find it's not the best option/once we've written it, we can debate the proposal/
<faceless> pes: i understand the reticence against a list and wanting something easier to maintain.
<Rossen__> ack pes
<Rossen__> ack dbaron
<jensimmons> a ruberic
<faceless> dbaron: to respond to jkew and pes - list is maybe to specific a term. we shoould be describing what we want to do and on each platform there may be a different approach - an API, a list, it's the intent that matters.
<tantek> +1 dbaron
<faceless> dbaron: the main thing is that we try this and see what works.
<pes> what is the road to get to the right answer then?
<tantek> pes, where's the proposal? can you link it?
<tantek> start with that
<faceless> dbaron: I don't think we know what the best thing to do is yes. We can't specify this with the right level of detail on each platfrom, we need to allow for feedback from ach platform to find the best solution
<Rossen__> ack myles
<pes> this is not a new issue / problem.  An outcome that is “vendors will look into it”, this is not progress
<faceless> myles: first, responding to florian: feels florian was assuming that there was a single set of fonts common to everyone. we don't do that - we have different sets for different parts of the world.
<faceless> myles: seo even just for us, we can't have a single list that is uniform.
<pes> [tantek : initial issue / concern https://github.com/w3c/csswg-drafts/issues/4055, follow up proposal: https://github.com/w3c/csswg-drafts/issues/4497]
<faceless> myles: so we certainly can't across all OSes
<pes> Github: https://github.com/w3c/csswg-drafts/issues/4497
<faceless> florian: I was saying the current proposal specifies a single list, but that's probably not ideal. But that's our start point as it's in the spec.
<Rossen__> q?
<faceless> myles: there is no list for our platforms about what the currently available fonts are - we use an API.
<faceless> rossen: next steps. Pete is going to take a stab at moving the current statement from a MAY to a more stict version of SHOULD
<tantek> pes, I think we're at the point where we need sample spec text proposed in the issue. Just reviewed the proposal bits and looks a bit scattered TBH
<faceless> rossen: and the technical recommendations of how to reference those fonts, dbaron said this well - referring to this as a list is not the full picture. But it is a start
<faceless> rossen: once we have the actual proposal we can try to narrow down the technical soution
<tantek> pes, I'm not disagreeing with the issue. I read through 4497 and the proposal there is more of an outline of desired outcome
<faceless> s/soution/solution/
<faceless> rossen: anything else?
<pes> tantek: this might be clsoer to what you’re looking for https://github.com/w3c/csswg-drafts/issues/4497#issuecomment-565832611
<faceless> rossen: pete, we're not trying to sandbag this - it's a normal process. we are interested in this and that might not be clear. Bear with us and once you have the actionable definition we'll go from there
<faceless> rossen: I suggest we end this and move on and will come back to it once pete has acted? on the next call?
<faceless> pes: when is that?
<faceless> rossen: probably two weeks
<tantek> pes, that's a very good summary start. Now, where in the spec would you put that, and can you reword it procedurally as a set of steps that browser should/must follow?
<faceless> roseen: thanks for your engagement
<faceless> s/roseen/rossen
<dbaron> The calls are Wednesdays at 9am California time / noon Boston time / etc.
<pes> tantek: on it :)
<tantek> pes, related, you may be interested in contributing to https://github.com/w3c/csswg-drafts/issues/4697
<faceless> rossen: ok, lets get on with it. a few text related topics. clarifying skip ink auto is related to CJK
<faceless> myles: no this was opened by jkew
Thank you for having me this morning and for making time on the agenda.
I want to make sure the other action item we discussed wasn't dropped. @tabatkins and others mentioned their teams are working on solutions. It would be very valuable to know more about those too
@tabatkins can probably provide better links/info, but I do see there are some considerations on restricting local font access here https://github.com/inexorabletash/font-table-access/
@astearns thank you for the link. PING is concerned about that proposal. Any information about how that proposal would address current fingerprint-ability (rather than maintain current fingerprint-ability, but try and restrict additional surface behind a permission) would also be useful
Jeffrey Yaskin and Brad Lassey are both on the PING, and are the people I was referring to when I was talking about people representing our privacy team.
@tabatkins I'm not sure I understand. Last time I asked them about font fingerprinting plans, you and they pointed me at the privacy budget document, which doesn't have any specifics in it and hasn't been updated since it was first published.
Are you saying thats the current latest information, or that I should talk to them about Chrome font plans instead of / in addition to Chrome reps on CSS Fonts?
Not arguing, just trying to figure out whats knowable and who to ask
You should talk to them for further details, yes. They'll either have appropriate information, or can directly refer you to other engineers who will. This is all still under active development, mind; we might not have further details beyond explainers of our newer experiments.
In reference to the meeting log: Clearly, a spec could at most describe qualitatively what kind of installation provenance a font should have to be visible to the Web and possibly provide lists for some well-known systems (but linking to Microsoft's and Apple's own docs seems easier). The allow-list necessarily needs to be a per-OS thing, and the W3C can't say that this is the closed list of operating systems that will ever exist. Also, this whole concept won't make sense for such Linux distros that don't have a broad installed-by-default font set.
@snyderp FYI: https://github.com/jyasskin/font-anti-fingerprinting
One of the things that the repo above made me think of is that we might be able to roll out an anti-fingerprinting response locale-by-locale.
In order to avoid breaking content, we could decide to limit local font access for particular locales where we have researched and vetted that this does not cause a content problem.
For locales where we are not yet sure whether particular local fonts are required to successfully view content, we could continue to allow local font access until sufficient research has been done. The research result should either allow us to limit local font access entirely or identify particular locally-installed fonts that need to be allowed.
I'm worried about the impact of this on minority language users, and also specialists such as egyptologists, and other script researchers. If users can opt to allow certain fonts which are never likely to appear in the browser defaults, it sounds like they expose themselves to fingerprinting, which doesn't make this a great solution for large numbers of people.
If i understand correctly, the problem arises because people serving web pages are able to inspect which fonts are available on a users system.
Could someone explain for me (in simple terms) why users and implementers need to jump through all the hoops described above rather than simply preventing the web browser from communicating back what fonts are available on the user's system?
@r12a I do not think it is practical to prevent this information leak, if the fonts can be used to lay out web content. As I understand it, the attack is to lay out some text, measure it, change the font, measure again. If the measurement changes then the font is available on the user's system. Repeat these steps 100-10000 times and you get some useful information about their installed font set. We would have to disallow measuring of laid-out content to prevent the leak, which would break a lot of the web.
One thing I have wondered is whether browsers could continue to allow locally-installed fonts to be used _for font fallback purposes_, even when they're not exposed as family names that font-family can match.
This would mean that if the user has installed a local font to support a Unicode block that the default OS fonts don't cover, content using those Unicode characters would remain readable rather than being rendered as blank boxes or whatever.
A site wanting to fingerprint users would presumably be able to tell that the user has _some_ font that supports the given Unicode block, by detecting a difference in metrics from what "tofu" rendering would give, but would not be able to directly test for specific font-family names.
ISTM this would increase the effort involved in font fingerprinting (it would now require researching the specific Unicode ranges that might be relevant, not just a list of thousands of potential fonts), while at the same time greatly reducing the amount of information that could be gleaned (only the presence of _a_ font for an otherwise-unsupported Unicode range, not a long list of general-purpose font names), which makes the whole area a much less attractive target for would-be trackers.
@r12a wrote
If i understand correctly, the problem arises because people serving web pages are able to inspect which fonts are available on a users system.
Not directly, and not in so flagrant a way as was possible with Flash (which returned a complete list of all installed fonts, with a single function call). Instead there needs to be text (possibly hidden) styled with a specific font name, and that text is tested to see if the font loaded. In other words, each possible font is tested, one by one, and each test introduces some small delay to overall page rendering. So in practice, privacy testers such as the EFF one test a couple of hundred common fonts.
@r12a wrote:
I'm worried about the impact of this on minority language users, and also specialists such as egyptologists, and other script researchers.
It helps to look at the taxonomy of users suggested by @hsivonen; the egyptologists would fall into group 6
If users can opt to allow certain fonts which are never likely to appear in the browser defaults, it sounds like they expose themselves to fingerprinting, which doesn't make this a great solution for large numbers of people.
Only if some web page has
<p class="foo">𓆓𓂧𓆑𓆓𓂧𓀀𓈖𓏏𓈖𓏥𓂋𓍿𓀀</p>
.foo {font-family: Segoe UI Historic};
and then tests to see that Segoe UI Historic in fact loads. Any other font covering Egyptian would not be tested here.
For example, I ave several fonts which are very unusual and should make me uniquely identifiable, if a list of all installed fonts was available. But it isn't; an attack page would need to specifically test, by name, for oddities like:
Sure. I wasn't very specific about what i meant by 'this'. What i'm most concerned about is whether those users will be prevented from using the local fonts, like they are in Safari. Taking egyptologists as an example (of a specialist group), fonts like JSesh are widely used (by egyptologists), and new fonts are in development for support of the new Unicode characters that create quadrants. These fonts are relatively large, due to the nature of the script (so not great if you have to download several as webfonts to use on a single page), and i assume are unlikely to be added to any lists. I'm hoping that we find a solution that ensures that egyptologists can continue to use these fonts from their local system, and a solution that's not overly technical or complicated.
Similar concerns arise for users of 'minority' languages or scripts, especially if there's a backlog of existing content out there that relies on system-based fonts.
Canʼt we simply put a limit on the number of (local) fonts a single site may use?
I'm worried about the impact of this on minority language users, and also specialists such as egyptologists, and other script researchers.
(Non-Linux-using) egyptologists will be fine. macOS, Windows 10, Android, and Chrome OS ship with font support for Egyptian hieroglyphs by default.
Unfortunately, Ubuntu and Fedora don't install font support for Egyptian hieroglyphs by default. I think the correct way forward is convincing them that they should.
If egyptologists want JSesh specifically, the options are including it as a Web font or opting out of privacy protection.
If users can opt to allow certain fonts which are never likely to appear in the browser defaults, it sounds like they expose themselves to fingerprinting,
The alternative seems to be leaving everyone exposed to fingerprinting.
which doesn't make this a great solution for large numbers of people.
I believe the number of people is much smaller than it appears considering that the most popular systems already have even long-dead scripts covered.
Canʼt we simply put a limit on the number of fonts a single site may use?
How would that work? Scripting can change the page over time, and trying to maintain quota state over time make the quota maintenance state itself fingerprintable local state.
(Non-Linux-using) egyptologists will be fine. macOS, Windows 10, Android, and Chrome OS ship with font support for Egyptian hieroglyphs by default.
I can't see limiting people to a single font, and one that they have no control over, is going to work, either for specialised communities or for minority languages. Note for example that:
I'm willing to bet that the answer to all of those questions is no, and that adding support is a hit and miss affair with long lead times, whereas the community has developed or is developing its own fonts in a much more timely manner.
As for using webfonts, the fonts used by egyptologists are large, ranging in size from around 0.5Mb to almost 6 Mb in size (and they will grow with the repertoire). It's not always ideal bandwidth-wise to require them to be used as webfonts only, especially if you want to use more than one font per page.
I'm not arguing against addressing the fingerprinting issue, i'm just saying please provide an easy opt out when people want to work with certain local fonts. Don't take the Safari route and make it impossible to use anything but what the system offers.
Serving fonts as webfonts may work more easily for many languages/scripts than for egyptian, but it does still add complications. For instance, if you are experimenting to find the right font for your content it will become tedious to have to create webfonts before you can test any font. And I'm hoping that, as someone previously mentioned, it would at least be fine to use local fonts through localhost.
I believe the number of people is much smaller than it appears considering that the most popular systems already have even long-dead scripts covered.
I think that's over-optimistic. So here's some data. This is a list of whole script blocks (rather than languages or new or missing characters, etc)  that are not supported by system fonts on my Mac:
for Europe, Caucasian Albanian, Cyrillic extended-c, Elbasan, Glagolitic supplement, Linear A, Old Hungarian, Latin extended-D & E, Linear A, Old Hungarian, Old Permic, Phaistos Disk;
for Africa, Adlam, Bassah Vah, Egyptian format controls, Medefaidrin, Mende Kikakui, Meroitic Cursive, Meroitic Hieroglyphs;
for West Asia, Arabic Extended A, Chorasmian, Elymaic, Hatran, Nabataean, Old North Arabian, Palmyrene, Psalter Pahlavi, Syriac Supplement, Yezidi;
for Central Asia, Manichaean, Marchen, Mongolian Supplement, Sogdian, Old Sogdian, Soyombo, Zanabazar Square;
for South Asia, Ahom, Bhaiksuki, Devanagari extended, Dives Akuru, Dogra, Grantha, Gunjala Gondi, Khojki, Khudawadi, Mahajani, Masaram Gondi, Modi, Mro, Multani, Nandinagari, Newa, Sharada, Siddham, Sora Sompeng, Takri, Tamil Supplement, Tirhua, Vedic Extensions, Wancho, Warang Citi;
for Southeast Asia, Hanifi Rohingya, Pahawh Hmong, Pau Cin Hau;
for Indonesia & Oceania, Makasar
for East Asia, Bopomofo Extended, CJK Compatbility Ideographs Supplement, Ideograph Symbols & Punctuation, Hangul Jamo (large proportion of), Hangul Jamo Extended A & B, Kana Extended A, Kana Supplement, Small Kana Extension, Khitan Small Script, Miao, Nushu, Tangut
for the Americas, Mayan Numerals, Nyiakeng Puachue Hmong, Osage, UCAS Extended.
I'll stop there, rather than continuing with categories such as Notational systems, Alphanumeric Symbols, Technical Symbols, Numbers & Digits, Arrows, Mathematical Symbols, Emoji & Pictographs, Game Symbols, and Other Symbols, each of which i expect to include at least one block without coverage.
Some of the above are for archaic languages, but others are for minority languages that have finally been given a way to put their content online.
I'm also concerned, however, that your comment implies that as long as there is some font available, then that's fine. I'd argue that that's not usually the case. For example, this morning i was working on Syriac. Sure you can see characters in Syriac using a system font, but for the Mac that appears to mean the Noto Sans Syriac Eastern font only. That is not helpful if you are writing in a Western Syriac language, or working with religious or archaic content that needs the Estrangela font. I tried changing the language tag, but it didn't help. But even then, the Noto font doesn't really provide what people would expect to see. Here's a sentence in the 3 Noto fonts, Estrangela, Eastern, and Western:

And here's the same text using 3 fonts that are much closer to what you'd normally see when writing these 3 varieties of Syriac, each with their own distinctive characteristics, rather than with the harmonisation applied by Noto.

I'm not saying that you can't get around this by using webfonts, or for the time being local fonts, in your CSS. I'm just trying to make the point that a one-size-fits-all font, like Noto, is likely to strip out important local cultural aspects of the text, and if it's the only game in town, can cause significant problems in languages where there are writing style variants, such as the 3 syriac varieties just mentioned, or looped vs unlooped Thai letters, or Naskh vs Nastaliq vs Kano vs Magrebi etc styles in Arabic, slanted vs upright vs rounded in Khmer and other scripts, etc.
hth
I'm not arguing against addressing the fingerprinting issue, i'm just saying please provide an easy opt out when people want to work with certain local fonts.
Clearly, there needs to be an opt-out of protection to address the needs of users whose scripts don't have a system-bundled font.
Don't take the Safari route and make it impossible to use anything but what the system offers.
FWIW, I'm not aware of anyone advocating that kind of outcome for Firefox.
I believe the number of people is much smaller than it appears considering that the most popular systems already have even long-dead scripts covered.
I think that's over-optimistic. So here's some data.
My point was that already the system-bundled font coverage is so broad that in fact egyptology isn't in the category of wholly-unsupported use cases. I.e. if it appeared that egyptologists would have to opt out of protection, it's not necessarily so, hence "smaller than it appears". (I'm not claiming that no one would need to opt out. It seems well established that there's existence proof of cases that would need an opt-out.)
I'm also concerned, however, that your comment implies that as long as there is some font available, then that's fine.
There are different levels of adequacy. If there's no font at all, nothing works, and use cases like being able to input text into a global site (e.g. to write a tweet) can't rely on a Web font. However, when the text to be published is known, Web fonts can be relied upon for stylistic variability. Moreover, for user-installed fonts to satisfy stylistic variability, the authors need to know what fonts the users have, which doesn't scale well. For that reason, I don't think the notion that Web publishers want stylistic variability is an argument against (waivable) privacy protection by default.
the repertoire for Egyptian hieroglyphs is expected to expand from time to time, is there a guarantee that available system fonts will promptly deliver changes?
Chances are that the outcome depends on how active the user community is with bug filing.
does the system font you have in mind provide the coloured glyphs that egyptologists sometimes need, or the alternate styles they sometimes use?
do those fonts support the new Unicode formatting characters for arranging egyptian glyphs in quadrats?
These seem to be things that can be addressed by Web fonts when publishing documents and that aren't total blockers for usage in the context of systems where the person providing the text doesn't control the fonts of the system that shows the text (e.g. Twitter).
That is not helpful if you are writing in a Western Syriac language, or working with religious or archaic content that needs the Estrangela font
Do users of this language currently have that font installed such that Web authors are presently relying on it being installed?
(In general, with examples like this it's hard understand the severity. That is, I don't know how to map the example to a spectrum from rendering Italian with a fraktur font by default to rendering Polish with a French-typical acute accent angle.)
I'm just trying to make the point that a one-size-fits-all font, like Noto, is likely to strip out important local cultural aspects of the text,
To understand to what extent what's being proposed here would affect this, one would need to know to what extent sites currently rely on the user community having specific non-default fonts installed for expressing these cultural aspects (as opposed to using Web fonts). Considering how hard it is for the user to install fonts on Android, it seems unlikely that in parts of the world where Web access skews very heavily towards mobile devices Web authors could rely on user-installed fonts for this kind of cultural expression at present. That is, it seems that Web fonts are already needed for this level of control.
and if it's the only game in town, can cause significant problems in languages where there are writing style variants, such as the 3 syriac varieties just mentioned, or looped vs unlooped Thai letters, or Naskh vs Nastaliq vs Kano vs Magrebi etc styles in Arabic, slanted vs upright vs rounded in Khmer and other scripts, etc.
There for sure are cases where limiting local fonts to system-bundled ones would restrict stylistic richness along those lines, but loopy vs. non-loopy Thai and Naskh vs. Nastaliq don't appear to be distinctions that limiting local fonts to the installed-by-default system fonts would break.
I think we're mainly thinking along similar lines.
To understand to what extent what's being proposed here would affect this, one would need to know to what extent sites currently rely on the user community having specific non-default fonts installed for expressing these cultural aspects (as opposed to using Web fonts).
I've been wondering the same thing, but i don't have a definitive answer. Anecdotally, however, my interests and work with the Unicode Editorial Committee lead to me, on a very regular basis, hunting down fonts for minority or archaic scripts. The almost universal pattern is that you find a webpage that allows you to download a Unicode font (or if you're lucky more than one) (or for non-Unicode fonts often a package including keyboard). They don't usually provide WOFF fonts, they just expect you to download the font to your system. But there are often only a couple or at most a small handful of viable fonts for many of these scripts (and that situation typically lasts for several years). I _imagine_ this has by now led to the build up of a (relatively speaking) fair amount of legacy content that will break unless the user opts out.
Note also that people download these fonts for use with non-Web applications, too, such as Word, so that's another reason they download the font file to their computer.
Note, btw, that Hanifi Rohingya is one of the scripts that is not supported by a system font per my tests above, and there are currently very few Hanifi Rohingya Unicode fonts. Although Rohingya text probably isn't tested for in the more general fingerprinting patterns, if you actually wanted to identify Rohingya people and targetted your fingerprinting specifically to that, then it sounds like it would be fairly easy. On the one hand, it's worth pushing for system fonts to cover this and other missing minority scripts, but on the other, it's not a great solution for the user if there's only one font available. So i think that actively encouraging a culture that provides webfonts is probably something we should try to do, as well as working on the technical side.
I'm grateful for the conversation here, and encouraged the by the proposals above. I want to suggest a slightly different approach, that (I hope) will maintain the following use cases:
If there are concerns or use cases not covered in the above, please let me know. I'm mostly trying to generalize-w/o-loss over some of the concerns shared above.
My proposal is:
(font, local, os) tuples, for additional fonts the browser can load off disk iff they are on the disk (e.g. if they're not on the disk, the browser doesn't magically provide / load them).  I dont have a good name for these, so i'll just call them vendor-local-fonts. How vendors create these sets of vendor-local-fonts is up to the vendor, but I imagine this is a place that resources could be shared, and this working group / W3C could be very useful in maintaining a useful set.default setting (websites can access OS-provided fonts, web fonts, and vendor-local-fonts)custom setting, that shows users the fonts the browser knows about on disk that are not OS-provided fonts, and users can opt those in to the set of fonts sites can accessThis is similar to the proposal in the issue text at the top of this thread, but provides a) a path for vendors to define better defaults for users per-local, b) still maintains the needed use cases, c) would require no changes for most users
I18N discussed this in our most recent teleconference and I was actioned to write this.
In general our thoughts are as follows:
So: our tendency is to recommend that CSS guard against fingerprinting by default, but provide clear normative guidance to allow customer-installed fonts to be used, perhaps via some whitelisting mechanism.
Hi @aphillips Thank you for the update. From reading through the 4 points above, I'd be very curious on your reaction to the proposal in https://github.com/w3cping/font-anti-fingerprinting/pull/6#issuecomment-599359211 which, I believe, satisfies each of the 4 points and the "So: …" summary at the bottom
Adding a link to latest proposals:
Font Anti-Fingerprinting
Font Fingerprinting Protection Through Better Defaults and Measurement Hinting
Also related: https://github.com/tabatkins/proposal-local-font-access
Academics studying the Seal Script regularly need to use multiple fonts for getting the exact shape from an exact version of Shuowen Jiezi. Each font is upwards of 10 MB, a collection of the four main books will be upwards of 40MB. This cannot be practically delivered via web font technologies, and it would very inconvenient for every user to have to enable it in the browser through settings. Also, a on/off setting for all sites means forcing researchers to choose between extremely high fingerprintability vs being able to actually do their work.
What if instead the website is responsible for requesting to use a particular local font, limited to e.g. 5 requests (each of 10 font families) per site (eTLD+1)? The user will have to explicitly grant access to a particular font for the specific origin, reducing additional fingerprinting to sites the user trusts.
@hfhchan Thank you for the comment! I'm not familiar with Seal Script; could you share more information about it if you have links?
Regarding the specific problem you mention though, I think, in a sense, thats the easier problem, where there is an expectation that sites can be updated. @tabatkins had a great suggestion for a way that sites could explicitly requests fonts off disk with user involvement, and I think would cover the use case you describe.
In a sense, all the complexity in the above discussion is mostly about cases where we can't expect sites to update, and how to provide privacy to users of sites that require non-common fonts there.
The fonts based on the Tenghuaxie version submitted for ISO10646 standardization of Shuowen Small Seal is 10MB in total.
Only considering the Small Seal in Shuowenjiezi and its different versions, in WG2 N4716 page 6 there are five versions of Shuowen that is of interest. If you look at WG2 N5117 page 8, there are even more versions of Shuowen and commerical adaptations.
Not to mention actual Small Seal carvings are different from those in Shuowenjiezi, illustrated in this article "目前坊间的小篆字体有什么区别?" (What's the difference between the different released small seal fonts?") https://www.zhihu.com/question/41780292. There is still no consensus on how these differences be handled in Unicode, and researchers mainly rely on PUA or putting the glyphs on top of their corresponding CJK Unified Ideographs.
Regarding the specific problem you mention though, I think, in a sense, thats the easier problem, where there is an expectation that sites can be updated. @tabatkins had a great suggestion for a way that sites could explicitly requests fonts off disk with user involvement, and I think would cover the use case you describe.
That sounds great. For the case of legacy sites where content can't be updated, besides having a global setting, it might be easier for users to install a certain WebExtension which dynamically injects the necessary JavaScript to request access to fonts for required sites.
@hfhchan thanks very much for the above details! I will dig into them!
I think the web extension idea is also neat, though i think requiring a browser extension to visit a site might be prohibitive, especially for users on lower power, older or otherwise not-necessarily-extension-supporting-browsers. But its a neat idea, would be very interested to know what others think too!
In the case of CJK-related scripts research where there isn't a standard encoding model yet, the common practice is to use PUA, which entails installing a bunch of fonts and proprietary IMEs. So, comparatively, installing a WebExtension is both easier (no admin rights needed) and safer (code is sandboxed; sites which the extension can access are declared upfront in the manifest), and would be the least problem for researchers.
See also: https://github.com/w3c/csswg-drafts/issues/5421 (which directly contradicts the OP of this thread).