Julia: RFC / Discussion: Security and Julia

Created on 12 Jan 2015  路  22Comments  路  Source: JuliaLang/julia

Julia's ease of use and speed advantages over other languages makes it a very good candidate for general purpose programming in addition to scientific analysis. There have been a few interesting issues that have come up that indicate that a deeper exploration of Julia in a potentially untrusted/multi-user environment is warranted (_cf._ #9147 and the recent discussion on julia-users about include_string() here: https://groups.google.com/d/msg/julia-users/fcB3z2shu1M/5gojVAbeUk8J).

I think several questions need to be answered prior to engaging in an exploration of Julian security:

1) Is Julia explicitly designed to be used in a multi-user / untrusted environment? Alternately, is there any design goal for Julia that would preclude its use in such environments?
2) Are there known issues with Julia that would expose data or systems to inadvertent code execution, disclosure, or alteration of data?
3) Has data integrity and confidentiality in Julia been the subject of prior analysis?
4) As a followon to 2), what mechanisms exist in Julia that could be considered security-centric?

There are likely other questions that need addressing as well. Is there interest in examining this topic?

speculative

Most helpful comment

It could be good to have some kind of "sealing" mechanism to protect key parts of the system from changes.

All 22 comments

I think there is certainly interest to explore our security, as more and more people want to deploy Julia in production environments.

Perhaps we should have a security label.

Julia code can do anything C code can (e.g. you can work with raw unchecked pointers if you want to), so it is equivalent to C code in terms of security. i.e. you should not run untrusted Julia code, unless you use OS-level sandboxing. This isn't really much different from Python, for that matter.

Language-level sandboxing seems like a waste of time for a small language, honestly, especially one for which web applets are not a primary focus. Even Java gave up on sandboxing because it proved unworkable in practice to close all the security holes. JavaScript is only able to maintain a reasonable level of sandbox security because there is a huge industry devoted to continually vetting and patching its implementations.

@stevengj that's a great point, but there are a few differences from Python, I think, and an evaluation of Julian security might go beyond "what can a malicious actor do with code" into "what unintended consequences exist?":

One of the big differences I see when comparing security models is what happens by default; that is, is the default state something that does not lend itself to violations of CIA? In Python, to use an example that might quickly become tiresome, arrays and other data structures are initialized (via the default allocation mechanisms) to zero and are therefore safe to reference pre-assignment. In Julia, right now, that's not the case, and it may be the case that the initial contents of allocated arrays contain sensitive information from elsewhere in the system / process. Regardless of whether or not Julia persists with this allocation method, it has security implications and should be explicitly checked for whenever you're doing something that involves multi-user environments. In this example, the cost of allocating zeroed memory needs to be weighed against the increased complexity surrounding software assurance in these environments.

This is just one of several possible security discussions. The reason I posted this RFC is because at this point my list of potential security "gotchas" consists of 1) a couple of knowns, 2) a couple of known unknowns, and 3) a bunch of unknown unknowns. I'd like to move items up that stack, but getting from 3) to 2) will depend on the expertise of the core group and others.

Uninitialized arrays can be allocated in Python, too (e.g. by numpy.empty). That being said, zero-ing memory in default constructors is something we are discussing in #9147 so we shouldn't duplicate that discussion here.

@stevengj they CAN be, but they aren't _by default_. I'm involved in the discussion in #9147, but I intended this RFC to examine whether there are other Julian behaviors that might have a security impact. I'm hoping to learn more about other potential "gotchas" that seem intuitive to Julia experts but may not be apparent to novices (like myself) who might assume a different behavior.

My difficulty here is that it seems like three entirely different issues are being conflated:

  • Can you safely run untrusted Julia code? (No, and probably never outside of OS sandboxes.)
  • Can you write buggy code in Julia? (Yes.)
  • Does Julia make it easy to write code that securely deals with sensitive information? (Not yet.)

Now you seem to be focusing on the second question, but the label "security" does not help much with that discussion. Of course, any bug can potentially be the source of a security hole, but security is only one of many problems created by bugs. Wherever possible, we want Julia to be helpful in detecting common errors (e.g. buffer overruns), as long as the performance hit is not too great. We try to avoid introducing "gotchas". But it's hard to have a general discussion of this. If you have specific cases where you think Julia could do a better job of catching common programmer errors, it would be more helpful to file specific issues on those topics (if they do not exist already) than to have a single issue that tries to list all potentially bug-prone Julia constructs.

The third question is specific enough that some progress could be made. Right now, Julia does very little for you in helping you to securely handle sensitive information. You really want access to things like mlock or something like .NET's SecureString, it seems, although this is not my area of expertise.

I agree with @ViralBShah about adding a "security" label (on github) for issues that touch on specifically on security concerns.

Sorry if this is confusing; it's a big topic, and I guess it's necessary to put some boundaries around it to have any meaningful discussion. Here are my thoughts:

1) what constitutes "untrusted"? I assert that we run untrusted code all the time, largely to no ill effect (I'm doing it right now with Julia!). I'm less interested in malicious code generation - there's no practical way of insulating a system against a deliberate attack by a competent application _author_. Note that this is very different than insulating a system against a deliberate attack by a competent (or incompetent!) application _user_, which is where one of my interests lies. That is, under what circumstances might it be possible for a user of an application to effect a violation of CIA that is unintended by the author?

2) I'm less interested in whether code is buggy (again, ambiguity in definition?) than whether it can be used to perform activities that the author didn't intend.

3) This is the most interesting issue for me. Why do you say that?

@sbromberger, all bugs are activities that the author didn't intend. I don't think this is a helpful distinction.

If you have a Julia program working with CIA data (or any other sensitive information), that goes back to my point (3): I think that adding facilities to work with secure data in Julia, ala SecureString, is a well-defined and worthwhile topic to explore in a github issue. But I would prefer that an issue be opened _specifically_ on that topic.

Sorry for the acronym overload: CIA is a security acronym denoting Confidentiality, Integrity, and Availability. It is not a reference to the three letter agency.

@stevengj

I agree that the ability to run untrusted code is not a goal that we should pursue. As you say, sandboxing is pretty much a fool's errand at this point.

However, I think there is some distance between your points 2 and 3. So the way I look at it is: "Is julia safe with untrusted input when correctly written" I think that is a guarantee that a language runtime should provide.

So the idea is, if I have a Julia code running, can it take input from the external world safely. Of course, this will need that the program be safely written, for a certain set of rules. eg, don't use eval on an input string.

So the question is twofold... a) what are those set of rules that can constitute a safe Julia program; and b) a guarantee by the runtime that following those rules ensures that untrusted input cannot cause untrusted code to run.

Of course, none of this is to say that any of this should be prioritised ahead of everything else that needs doing in Julia in the short term.

I was writing another response trying to clarify, but @aviks managed to get across my point in a much better way.

Just to add, though: I'm not suggesting we FIX anything right now, as there's nothing to fix (except for #9147, perhaps). What I'm trying to do is gather information about things that might deserve deeper scrutiny in a production / multiuser environment. Things like allowing users to manipulate symbols _might_ (I don't know enough yet) fall into that category as a class of activities that should be discouraged in such environments.

My view is that we should do reasonable and inexpensive (in terms of performance and effort) standard security-improving things, such as zero-initializing by default (which I now support). Julia is already a memory-safe language, which helps a lot. And we should certainly follow best practices in our C and C++ code.

However I don't think it will be a priority to provide security perimeters within the language. No guarantees if you eval a string. And at this time I would say features like protecting confidential information flow are not a core language priority. Now I see I am seconding what @aviks says.

@aviks, if we assume that all of the C libraries called by Julia are bug-free (ha), and the Julia base library is bug-free (ha), then aren't the only Base functions that can execute arbitrary code eval, include_string, and include (and things like require that call include)? Plus ccall on arbitrary pointers, of course.

Symbols are just a special kind of string and don't pose any security risk in themselves (unless there is some bug in the Symbol code of base Julia).

(Not that any of this, or anything we are likely to do, will satisfy someone who wants some kind of formal security certification ala TEMPEST.)

Arbitrary code execution is just one piece of this pie. Please see initial question 2 for examples of other areas of concern..

Alteration of data: aside from explicit calls to unsafe_store! (or passing invalid pointers to external C libraries), I can't think of any way to alter arbitrary memory. Of course, there are lots of ways to write buggy programs that write to data structures in cases where they weren't supposed to, but I don't think that's what you're asking.

The second paragraph of my answer should be interpreted expansively. What I mean is that _all_ we currently intend to do is adopt reasonable, standard measures like zero-filling when they are suggested, and eliminate buffer overflows from our C code, etc.

There have been no major security analyses of julia, and currently none are planned, but this is mostly for lack of person-hours and/or funding. Important to consider are

  • Julia tends to be permissive (e.g. any code can write to any mutable object fields)
  • Julia does lots of C interop with ccall and unsafe_load/unsafe_store!

A very close approximation to julia's security stance is "like C, but without most memory-safety-related problems in normal code" (buffer overflows, type-punning, uninitialized data (to come)). Julia is type safe as well, but this is enforced at run time. For many security purposes that's good enough, since it still prevents certain bad things from happening (though your program might also stop with an exception).

I've only skimmed the topic here, but since the inference stage is written in julia, you can probably convince the compiler to emit just about anything you want by corrupting any of the functions it depends upon.

parsing code should always be safe (finite, terminating, no side-effects). but i don't think eval can ever be considered "safe"

Safe is a relative question, but one might imagine a safe_eval with a timeout and a verification step to ensure that it doesn't modify the system.

It could be good to have some kind of "sealing" mechanism to protect key parts of the system from changes.

Just to add another possible example: in https://groups.google.com/forum/#!topic/julia-users/A0DGzPVfiAI there's a discussion about how intentional mutation of immutables may cause unintended consequences.

Doesn't seem to be anything actionable here.

Was this page helpful?
0 / 5 - 0 ratings