Rust: Protect privacy by mangling the path string of source files in the generated binaries

Created on 15 Mar 2017  路  14Comments  路  Source: rust-lang/rust

According to #40374 , a part of private information is leaked by the path string living in the compiled binaries. To protect privacy and help debugging, I think we can let rustc to mangle the path. Here is my solution:

Basically, we can hide the 'insignificant' part of the path (which usually contains some private and/or unrelated info), leave 'significant' part untouched. Then what is the 'significant' part of a path? Here is an example: Assume that we have a project foobar which is in user's home directory (here I use a windows path, on *nix things work similarly):

C:\Users\username\Documents\foobar

In this case, the useful part is the crate name and the part after the crate name, i.e.

\foobar\lib.rs

Assuming we have a mod called 'somemod', then after mangling, the new path looks like:

[crate]\foobar\somemod\mod.rs

which not only saves the relatitionship information between sources files, but also protects the privacy of the user (since no more user name or aboslute path exists) !

The next question is how to process all paths under this rule. From what I know, all compiled code of a crate comes from 4 sources:

  • crates.io or mirrors of crates.io, or some independent 3rd party repo, in which the code is cached and lives in the cargo cache directory
  • remote git repositories
  • local filesystem
  • the crate itself (which may locate at any place)

We could specify different root names for these sources to indicate their origin:

  • For code from crates.io or mirrors, the root name is [crates.io]. Example:
    C:\Users\username\.cargo\registry\src\github.com-1ecc6299db9ec823\winapi-0.2.8\ will be mangled to [crate.io]\winapi-0.2.8\
  • For code from remote git repository, the root name is [username@git server name]. Example: https://github.com/rust-lang/rust will be manged to [[email protected]]/rust
  • For code from local filesystem, the root name is [local]. Example: D:\workspaces\foobarng\ will be mangled to [local]\foobarng\
  • For code from the crate itself, the root name is crate. Example: C:\Users\username\Documents\foobar\ will be mangled to [crate]\foobar\

And for helping debuggers, all paths in debugging information won't be modified. Thus, users still know where the debugging code is, and won't worry about leaking privacy (just need stripping out debugging information before packaging on *nix, or not distributing .pdb files on windows).

C-feature-request T-compiler

Most helpful comment

It seems to me that two different people building two identical programs with identical build args on different (but same-architecture) systems should receive the same output binary regardless of the name in $USER or the string in $HOME, for all build types (but especially release).

I'm not going to weigh in on the privacy issue (I think that if you're privacy conscious, $USER should be anonymous or user or something already), just the principle of least astonishment: what I would expect, not knowing the tooling, is that the same thing built on different systems would result in the same output. Deviation from this would surprise me, given what I know about the generally excellent caliber of Rust stewardship (and the well-known gargantuan task of stripping out this unnecessarily nondeterministic stuff from other distros/packages in the pursuit of deterministic builds). Tool designers should probably not be throwing more rocks into their path.

This probably means stripping any mention of local environment, build time, and file paths before the root/prefix of the build.

All 14 comments

Related to #38322 & #39130 (which are for debuginfo). Likely makes sense to use the same mapping mechanism for filenames in panics too.

(edit: linking to #40492 as it is the PR for #40374)

We may need to record path mappings in a file if we also want to process paths in debug information.

Related to #38322 & #39130 (which are for debuginfo). Likely makes sense to use the same mapping mechanism for filenames in panics too.

--remap-path-prefix will also remap panic messages.

Seems like we now have a working and stable solution. Closed.

So what's the current solution to not include the path in the exe (when building with cargo)?

Try something like RUSTFLAGS=--remap-path-prefix=<your-src-dir>=src cargo build.

Can we reopen this to track having rustc do this by default? You shouldn't have to know both that rustc does this and this obscure mechanism for changing it to protect privacy. It should just do it.

I think this would need an RFC to come up with a solid solution that doesn't break debugging (which relies on these paths being contained in debuginfo). cc @rust-lang/core

@michaelwoerister But it should be stripped automatically from --release builds!

An RFC would need to cover both debug and release builds.

Try something like RUSTFLAGS=--remap-path-prefix=<your-src-dir>=src cargo build.

@michaelwoerister this didnt work for me and a username and full path was still present. and yes i also used the --release flag. i also second @Boscop in that this should be automatic for release builds. why does the rfc need to include debug when we are talking about release specifically? @jimmycuadra is also right normal users shouldnt have to know this exists or be expected to manually specify the opts all the time since its so obscure.

if it helps you including user ids like this violates gdpr https://gdpr.eu/eu-gdpr-personal-data/ so this should be addressed by the rust team. in 2020 people care about privacy and this can be a put off like https://github.com/rust-lang/mdBook/issues/847 where people actively worked away from the project due to the disrespect of user privacy

cc @sneak & @aral who might also have some words about this

To be honest, I expected that release binaries do not contain information like this. Would this be an option to cut the string in these cases?

To be honest, I expected that release binaries do not contain information like this. Would this be an option to cut the string in these cases?

@dns2utf8

not sure if part of the message got dropped. what is the option you are suggesting? the provided RUSTFLAGS dont actually work. it seems the rust team thinks this is acceptable for release binaries for some reason. perhaps more attention on the issue may help given the push for privacy in 2020

unrelated but this also causes an issue with reproducible builds since the strings and usernames will differ from person to person

It seems to me that two different people building two identical programs with identical build args on different (but same-architecture) systems should receive the same output binary regardless of the name in $USER or the string in $HOME, for all build types (but especially release).

I'm not going to weigh in on the privacy issue (I think that if you're privacy conscious, $USER should be anonymous or user or something already), just the principle of least astonishment: what I would expect, not knowing the tooling, is that the same thing built on different systems would result in the same output. Deviation from this would surprise me, given what I know about the generally excellent caliber of Rust stewardship (and the well-known gargantuan task of stripping out this unnecessarily nondeterministic stuff from other distros/packages in the pursuit of deterministic builds). Tool designers should probably not be throwing more rocks into their path.

This probably means stripping any mention of local environment, build time, and file paths before the root/prefix of the build.

Was this page helpful?
0 / 5 - 0 ratings