Hello. I'm a JS developer wanting to learn a compiled language to make some WASM libraries for my web software, however one of the reasons I like C++ is because you can't easily reverse-engineer the code so your work is somehow protected, isn't it? But as far as i have read, WASM is pretty much copy+paste and you get access to everything that was previously compiled for free?
When you compile a program to Wasm (e.g. C++), it will be a .wasm file, which is just raw bytecode.
It is impossible to retrieve the original C++ files from the .wasm file (because it is raw bytecode, it doesn't contain that sort of information).
However, the .wasm may include optional debugging information (e.g. DWARF, source maps, etc.) which would allow for retrieving the original C++ files. So you just need to make sure that the debugging information is not included in release mode.
Of course, you can't really stop people from reverse engineering your code, you can only make it more difficult. A determined person can reverse engineer your code no matter what (even with non-Wasm machine code).
Where did you read that Wasm allows for easily retrieving the original source code?
However, the .wasm may include optional debugging information (e.g. DWARF, source maps, etc.) which would allow for retrieving the original C++ files.
Not quite.. this would allow you to associate names to disassembled code, and see references to memory as structs, but it is still very far away from the original C++, especially for optimized code.
But yes, any sufficiently persistent individual can disassemble your code, and read it. This is the same, regardless of whether code is native (e.g. x86) or a high level VM (e.g. the JVM). Wasm is sufficiently low level such that it is probably closer to x86 than the JVM in terms of how "readable" dis-assembled / de-compiled code is.
You should generally never rely on your executable format helping you "protect" against reverse-engineering your code. It won't work anyway.
There are things that we might consider though, that would help. For
example, we might allow a wasm module to be marked as 'undebuggable'
On Thu, Aug 1, 2019 at 2:53 PM Wouter van Oortmerssen <
[email protected]> wrote:
However, the .wasm may include optional debugging information (e.g. DWARF,
source maps, etc.) which would allow for retrieving the original C++ files.Not quite.. this would allow you to associate names to disassembled code,
and see references to memory as structs, but it is still very far away from
the original C++, especially for optimized code.But yes, any sufficiently persistent individual can disassemble your code,
and read it. This is the same, regardless of whether code is native (e.g.
x86) or a high level VM (e.g. the JVM). Wasm is sufficiently low level such
that it is probably closer to x86 than the JVM in terms of how "readable"
dis-assembled / de-compiled code is.You should generally never rely on your executable format helping you
"protect" against reverse-engineering your code. It won't work anyway.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/WebAssembly/design/issues/1293?email_source=notifications&email_token=AAQAXUDH2YVJXN2SR527UTLQCNLNZA5CNFSM4IIELBAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3L73XA#issuecomment-517471708,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAQAXUEUDGQKTR77YUJ2FVLQCNLNZANCNFSM4IIELBAA
.
--
Francis McCabe
SWE
There are things that we might consider though, that would help. For
example, we might allow a wasm module to be marked as 'undebuggable'
And what will that do? wasm2wat, wasm-objdump and friends will refuse to dis-assemble these? And bad actors will promise to not build these tools from source and modify them? ;)
Understood. It's all about increasing the cost. If you can debug a module
as it is executing in its intended environment, then it is normally a lot
easier to figure out what the code is supposed to be. If you can't then you
have a more challenging problem.
There is a cottage industry of folk writing obfuscators for JS, it is not
perfect but in practice it is quite effective (albeit at a potentially high
cost in time,space and $)
On Thu, Aug 1, 2019 at 3:09 PM Wouter van Oortmerssen <
[email protected]> wrote:
There are things that we might consider though, that would help. For
example, we might allow a wasm module to be marked as 'undebuggable'And what will that do? wasm2wat, wasm-objdump and friends will refuse to
dis-assemble these? And bad actors will promise to not build these tools
from source and modify them? ;)—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/WebAssembly/design/issues/1293?email_source=notifications&email_token=AAQAXUENDHGZTXTUE7EYDYDQCNNKTA5CNFSM4IIELBAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3MA6HY#issuecomment-517476127,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAQAXUCKHFXTU7VHBXYICNDQCNNKTANCNFSM4IIELBAA
.
--
Francis McCabe
SWE
@aardappel Source maps are intentionally designed to allow you to access the original source code. They have been used with JS for years.
For example, if you compile a language like Clojure to JS, you can see the original Clojure source code while debugging in the browser, and even set break points in the Clojure code, even though the compiled code is 100% JS. This isn't specific to Clojure, it works with any language which compiles to JS.
As another example, you can minify JS code (such as using UglifyJS), and with the power of source maps you can see the original unminified code, exactly as it was before being minified, including whitespace formatting, comments, names, etc.
Similarly, things like DWARF contain a lot of information, including information about inlined functions, allowing the debugger to display the functions and call stack even though the functions had been optimized away in the binary.
The debugging repo is where such things are discussed.
Yes, there are tools (source maps + DWARF) to aid debugging, that would make reconstructing source information trivial. So to avoid that, don't ship debug info to release.
The point is though, only stripping debug info isn't sufficient to protect against a motivated reverse-engineering effort.
Other tricks the obfuscators employ: shape shifting code - the code is
transformed into 1000 (or how ever many you want) variants that have the
same semantics but look different. So, if you disassemble one, you have not
made much progress. Similarly, you can put an expiry date on your module -
bake into the code the allowable range of dates that it can execute in.
Shades of game code protection ..
On Thu, Aug 1, 2019 at 3:45 PM Jacob Gravelle notifications@github.com
wrote:
Yes, there are tools (source maps + DWARF) to aid debugging, that would
make reconstructing source information trivial. So to avoid that, don't
ship debug info to release.The point is though, only stripping debug info isn't sufficient to
protect against a motivated reverse-engineering effort.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/WebAssembly/design/issues/1293?email_source=notifications&email_token=AAQAXUGOHYPMA22TGIH2S7TQCNRPLA5CNFSM4IIELBAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3MC5EQ#issuecomment-517484178,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAQAXUCG5YT3HR4OBXTDR6DQCNRPLANCNFSM4IIELBAA
.
--
Francis McCabe
SWE
@Pauan source maps _refer_ to the original source code, which is very different from being to _re-create_ the original code from a .wasm if the original code is not available (which obviously it isn't).
@aardappel Source maps contain a "sourcesContent" field which contains the actual full source code as a string.
So yes, you can in fact embed the original source code inside of a custom section inside of a .wasm file, or inside of a separate .wasm.map file (which would be automatically downloaded and used when debugging). This is common behavior.
So what's the right way to avoid including debug content and make the binaries more theft protected?
@sebolio It depends on the compiler. With emscripten it won't emit debug content by default, so all internal wasm names are minified etc. (and in -O3 it will also minify external names - wasm imports and exports).
For other compilers, if they emit debug info you can strip it out, for example using wasm-opt --strip-debug input.wasm -o output.wasm.
I inferred (prehaps incorrectly) that the question was referring to WASM being designed to support disassembly to a human readable assembly language (WAT).
This is not unique to WASM, except that disassembly is imperative for Web code, while it's just something other assembly languages can do implicitly.
I think people are confused by the purpose of WAT (basically, making Web code available to everyone for study, even if the licenses on any IP are restrictive).
Most helpful comment
When you compile a program to Wasm (e.g. C++), it will be a
.wasmfile, which is just raw bytecode.It is impossible to retrieve the original C++ files from the
.wasmfile (because it is raw bytecode, it doesn't contain that sort of information).However, the
.wasmmay include optional debugging information (e.g. DWARF, source maps, etc.) which would allow for retrieving the original C++ files. So you just need to make sure that the debugging information is not included in release mode.Of course, you can't really stop people from reverse engineering your code, you can only make it more difficult. A determined person can reverse engineer your code no matter what (even with non-Wasm machine code).
Where did you read that Wasm allows for easily retrieving the original source code?