Adding TLS option to client-silo and inter-silo communications would make a great end-to-end encryption promise when combined with encryption between silo and storage. It looks like using TLS in socket connections involves adding a configuration option to settings and then applying it to socket connections. Encryption between silos and storage is less clear to me otherwise than that it may require more than simply changing connection string (for instance, here's background material for Azure Table Storage).
To get this going, what would be needed to add encryption?
How one adds TLS to C# Sockets? Is it the StreamSocket?
You can wrap the connected Socket into a NetworkStream object, and then use and configure SslStream, which will has the NetworkStream object as it's inner stream.
There is an OSS project: http://www.supersocket.net/ which has every piece that we need. I don't say to make a dependency on it, but take a look it has TLS/SSL support implemented based on the above classes.
I hope it helps.
@attilah That's a point to consider in some time span, especially thinking of xplat and CoreCLR. This touches upon #307, so I'll cross-link. I don't know if the right solution would be to add TLS as lightweight, one-off option and refactor later as configuration and storage communcations likely remains the same.
This would be great for the Azure Web Apps case, where Web Apps cannot currently communicate with Cloud Services without creating a point-to-site VPN. If we adopt this approach, I'd like the ability to customize client certificate validation
I would stay away from SuperSocket :)
I added the TLS support there several months ago and when SSL got that bleed breach, I added support to TLS 1.2 and 1.3 but they never updated the code. At the beginning the overall project looks nice, but it has a lot of problems and the project owner doesn't was so whiling to fix it in a timely manner so I decided to move away.
The TLS support is pretty simple, you must just use the SslStream class instead of a regular Stream, and provide a callback (optional) if you want to make client certificate validation at the server, or a callback in the client if you want to validate the server certificate. The server certificate is validated by .net internal code automagically, so if the machine connecting to the server doesn't trust any part of the certificate chain, the connection is aborted. If you add the validation callback at the client, you can bypass it by returning true for development/self-signed scenarios.
Look at this other project that I contributed for https://github.com/rdavisau/sockets-for-pcl
This has a simple "bite before bait" approach for sockets in multiple platforms and I added PCL support as well. That would be helpful for client/mobile applications.
@galvesribeiro You mean _teh_ bait-and-switch trick? :)
Yeah yeah thats it hehehe :)
Gutemberg
Sent from Outlookhttp://aka.ms/Ox5hz3
From: Veikko Eeva <[email protected]notifications@github.com>
Sent: ter?a-feira, setembro 29, 2015 3:20 PM
Subject: Re: [orleans] Adding optional TLS/encryption to transport channel (#828)
To: dotnet/orleans <[email protected]orleans@noreply.github.com>
Cc: Gutemberg Ribeiro <[email protected]gutemberg@fgrit.com>
@galvesribeirohttps://github.com/galvesribeiro You mean teh bait-and-switchhttp://log.paulbetts.org/the-bait-and-switch-pcl-trick/? :)
Reply to this email directly or view it on GitHubhttps://github.com/dotnet/orleans/issues/828#issuecomment-144144442.
I want this feature, so I'm willing to put some work in to make it happen.
So if anyone wants input on the design, this is a good place for it.
My personal preference is for self-signed certificates with support for mutual, certificate-based authentication, since everyone can use those.
I'm considering adding the following properties to NodeConfiguration:
``` c#
///
/// Gets or sets the thumbprint of the certificate which the gateway should use to identify itself.
///
public string ProxyGatewayServerCertificateThumbprint { get; set; }
///
/// Gets or sets the comma-separated list of accepted client certificates.
///
public string ProxyGatewayClientCertificateThumbprints { get; set; }
///
/// Gets or sets a value indicating whether or not client certificates are required to be valid.
///
public bool ProxyGatewayValidateClientCertificate { get; set; }
In the XML, I intend this to look something like:
``` xml
<ProxyingGateway
Address="localhost"
Port="40000"
ServerCertificateThumbprint="xxx"
ClientCertificateThumbprints="aaa,bbb,ccc"
ValidateClientCertificate="false"/>
I will add inverse properties to ClientConfiguration.
Accepting multiple server certificates based on thumbprint allows for certificate roll-over while still constraining the certificate within a set of known-good values. I prefer this over simply trusting that the certificate is valid according to the current machine's cert list and checking the CN.
ValidateClientCertificate and the corresponding ValidateServerCertificate allow us to add support for CNs later without breaking users who are using self-signed certificates. They will be forced to set that value to false or otherwise install & trust the client certificate on the silo host.
Feedback is appreciated. I can rename "Server" to "Silo" if that's preferable.
Just beginning to look at implementing an Actor-based system, and being able to encrypt inter-process communication is a huge deal for us. Was wondering if this work was in progress. Based on the last message it seemed possible.
@kkatsma I believe there is no progress other than what you see here. There was some discussion in Gitter. @ReubenBond and plenty of people hang out there too, so you might want to pop in and ask.
I think I personally need this for future endavours I'd like to materialize. Likely @ReubenBond too, but he's a busy startup guy. If you have the inclination, i.e. time and skills or willingness to learn as you go by (we are here to help, naturally), it'd be great to get tyres rolling. :)
I will be bringing this up at the roadmap meeting, which is the next Orleans Virtual Meetup
@veikkoeeva I'm still needing to complete POC work, but assuming that goes well, I would be willing to try. Or, as Yoda says - "do or do not, there is no try."
@kkatsma Sounds good. What I wanted tell that all work is useable and if you get other priorities, we are still better off. So no pressure on it. :)
Are there any plans to make progress in this area? We are seriously considering using Orleans for one of our projects, but lack of security in Orleans transport layer (both silo-to-silo and client-to-silo) is a show stopper considering our security requirements.
@alukyan what is your scenario? Also, what of your security requirements? I used to work with Orleans in Banking and Financial Transaction industry which requires PCI so maybe I can help you to workaround and provide the required security requirements while this issue are dealt with. If you want, jump in Gitter and we can chat about it.
Is there any progress regarding this issue?
No progress. Gigya open-sourced their Microdot framework (https://github.com/gigya/microdot) that I believe can be used to address some of the concerns by securing the http endpoints.
@sergeybykov Not sure how it can address transport layer security for internal Orleans RPC calls, which I understand use plain sockets, not HTTP. If Microdot addresses security of HTTP frontends, then it is not related to this issue.
Have you considered using gRPC as transport layer which supports TLS out of the box? According to authors due to efficient binary serialization, streaming support and TLS channel multiplexing protocol overhead is comparable to raw sockets
Has there been any movement on this? I've been tasked with adding/implementing this into Orleans as we need it for our project. Has anyone on the team thought about this, have a intended approach, available to brain dump?
@berdon, @jason-bragg made a prototype of this change using DotNetty as our transport layer and I believe this work is here
However, Project Bedrock and its abstractions would be somehow useful here once it get somewhere released.
Thanks @galvesribeiro!
@jason-bragg Where did you leave off on this? Was there any discussion of pulling these changes in?
@berdon, The DotNetty prototype was an exploration of replacing our networking layer, not just the socket layer, so it's a bit more involved than adding TSL. Replacing our networking layer with an existing and well maintained networking layer, in theory, would have allowed for secure communications, based on the assumption that a well maintained networking layer would have such support.
This initial prototype was not followed up on, as other work took priority.
Max Gortman from the DotNetty assisted, and was a great help.
Findings - From time of prototype, code is quite out of date now
TL;DR
Prototype went well, but was insufficient to make an informed call on replacing networking with DotNetty. Despite the inconclusive results of the prototyping, I remain optimistic of the potential, and am of the opinion that a second pass at the prototype is warranted.
Suggested next phase
Detailed Findings
Error handling ā As our current networking stack is mostly monolithic and synchronous while DotNettyās is modular and asynchronous (a good thing), we will need to take care to ensure that our error handling is preserved during the port. This is especially true in regards to preserving message order. There are two difficulties here, one is in identifying the expected behaviors, as they are not all documented, the other is testing that they are in fact preserved, as weāve no test scenarios built around the low level networking behaviors. This is an Orleans issue, not a DotNetty one, as we would encounter this problem moving to most any other networking technology, but needs be called out. So far, the only difficulty Iāve encountered (other than divining from the code which behaviors are expected) was handling message serialization errors. For the prototype I ignored serialization errors.
Serialization into networking buffers ā For performance reasons, Orleansā existing networking layer utilizes a buffering layer that is integrated with Orleans serialization. DotNetty utilizes a buffering layer for performance reasons as well. For the prototype, I used Orleansā serialization into Orleansā buffers then copied those buffers into DotNettyās buffers. This is inefficient. If we choose to go forward with DotNetty, we will either need to refactor Orleansā serialization to serialize objects into DotNetty buffers, or replace Orleansā serialization with an existing serialization technology which is already integrated with DotNetty. In any case, this inefficiency should be considered when the prototype performance results are evaluated.
Handshake ā Upon connection from silo or client, Orleans performs a minimal handshake. Using the existing DotNetty architecture, the approach I took for handling this is for connections to be initialized with a handshake handler and for the handle to be swapped out once the handshake is complete. There were issues with this functionality in the version of DotNetty that I used, so I worked around this with a temporary kludge. The DotNetty handshake issues I encountered have been fixed, so this capability should now be supported. While the need for this functionality should be noted, Iām not convinced this capability needs to be vetted as part of the prototype.
Bind vs. Listen ā We use the local address of the silo-to-silo listeners socket as part of the siloID. As the siloID is needed very early in the silo initialization, this necessitates binding to the listener socket early in the silo initialization. DotNetty performs the bind and listen at the same time, meaning we either start listening before the silo is ready (bad) or find another way to identify the silo. In general, this is not a problem, because the silo socket listener address is already well defined enough for the listen address and the bound socket address to be the same, but this is, currently, not always the case.
Thus far, Iāve spent some time abstracting out the Socket* pieces from Orleans into a Transport abstraction. Itās loosely based off the work you did some time ago with less effort in adding āprotocolā abstractions. The default implementation should, ideally, do exactly what is currently being done so as to not have any impact with the change. Been swamped elsewhere but hoping to get to testing the abstraction work soon. TLS, at that point, should hopefully come easily with a hidden layer that sits on top the underlying socket implementation ā likely just in some OrleansContrib.TlsTransport package or something.
To dig in a little:
ITransport (ie. Socket)
ITransportFactory
Socket -> SocketTransport
IncomingMessageAcceptor ->
SocketManager ->
For the most part it's been a lot of code shuffling and replacing Socket with ITransport.
The default implementation should, ideally, do exactly what is currently being done so as to not have any impact with the change.
@berdon do you think we will be able to use other non-socket transports if we use these abstractions as is Today? I think this abstraction should only care about reading and writing to buffers and the implementation would care about the proper details of _transporting_ it.
Also, are you thinking on add TLS (thru those abstractions) only silo-to-silo or do you have an idea on how to do it for the client as well?
Thanks! Good work!
@galvesribeiro That's my hope. I had originally wanted to mimic the Bedrock APIs and IPipe stuff but it seemed needlessly complex (for now).
The current ITransport is:
```c#
public interface ITransport
{
bool Connected { get; }
bool IsListening { get; }
object UserToken { get; set; }
EndPoint LocalEndPoint { get; }
EndPoint RemoteEndPoint { get; }
void Connect(EndPoint endpoint, TimeSpan connectionTimeout);
void ListenAsync(ITransportListener listener);
void RestartListenAsync(ITransportListener listener);
void Close();
void ReceiveAsync(ITransportReceiver receiver);
int Receive(byte[] buffer, int offset, int length);
int Receive(IList<ArraySegment<byte>> buffer);
int Send(List<ArraySegment<byte>> buffer);
int Send(byte[] buffer);
}
```
There are some obvious remnants of the Socket APIs left in but for a couple of reasons.
The above will probably shift some as there's some loose threads I still want to think about and/or address.
As for TPL - these changes were focused on eliminating references to Socket's completely from the code. My hope is that if I do that Silo to Silo and Silo to Client will both just "magically" work. I found some paths that seemed to diverge on being a Silo or Client and those will be the paths most likely to cause me some issues while testing.
Humm... Well, I was looking at the networking code now... I _think_ it is possible to abstract the sockets with a few interfaces and register the implementation of it thru DI without having to change much of the code.
IncomingMessageAcceptor, OutgoingMessageSender and SocketManager are the only things that must be abstracted.
The socket implementation can even moved to a separated NuGet package like Orleans.Core.Networking, and the Abstractions moved into Orleans.Core.Abstractions.
I'll play a bit with it here and let you know my findings...
That's essentially what I've done. I'll push to a remote branch and you can poke around. It isn't in a runnable state yet as I'm going through and reconciling the changes.
Great @berdon! Just looked at the code and looks pretty much what I was looking for.
I know it is not a review but I wonder if instead of use Endpoint type to address the transport, we shouldn't use Uri as Bedrock and SignalR does. That way we can still have the hostname and port on it and at same time, have the protocol information. Then in the Socket implementation of the transport, it would translate the Uri to a Endpoint. That would be useful to other implementations that can use other types of protocols.
So, it looks like the work was completed... When can we expected a PR so others can jump in to review it?
Also, yes, I agree with your commit comments that at least for now, TLS (and other implementations) should go to OrleansContrib.
I agree with the Uri stuff - I had hoped to get away from EndPoint.
I've been moved to some other things for right now :(. I still need to get it running, run the unit tests, and fix the issues that come up.
Ok. I tried to build it and run the tests but it is not compiling yet. Let me know if I can help with anything while you are busy. I'm always on Gitter.
Great work @Berdon!
Prior to putting out a full PR with this, would you mind posting the abstractions, either on an issue or as the first PR. I'm requesting this because the abstractions will be increasing our public surface and introducing a new extensibility point to Orleans, so that, imo, is the most critical aspect of this change. By vetting the abstractions prior to seeing the implementation details it will help narrow our focus onto that aspect and help prevent the code reviews from turning into design reviews.
Yeah, I plan on it. I was holding off on that and a PR until I have time to test and clean up the changes.
Interested to see how this develops. Hoping a few abstractions will shake out after we see a bunch of these different networking layers. I'm in the process of unifying the abstractions from kestrel and signalr. We're also putting pipelines in CoreFx!
In the process of doing this, there'll be a concept of connection middleware where things like TLS sit. Very similar to what netty does but built on different primitives. SignalR is an example of a connection middleware that can sit on top of the abstraction (so it runs on TCP or WebSockets or any of the other http based transport implementations). In a sense what we're building is very similar to the idea of OWIN, there are 2 sides, frameworks on top of the abstraction and transports that expose the ability to wire up connection middleware.
I saw that pipelines is about to hit CoreFx, in time for .NET Core 2.1, very exciting!
Do you think we'll have a robust TLS middleware for pipelines by that time, maybe based on some of @Drawaes' work?
Kestrel uses SSL Stream right now and @Drawaes work will be compatible with pipelines, not sure if it'll be "official" but it'll definitely be more performant š
Great, thanks. Maybe we can offer some kind of integration for SslStream & for Leto and let users decide
Looking forward to this feature being implemented. Now that GDPR has arrived it's become more relevant than ever. The lack of TLS between clients and silos and inter silo communication has pushed us toward implementing simple message property encryption in the latest project I worked on. It's inefficient and not perfect in any way but for the time being it's the best alternative.
Good to see that this issue has been tagged with P2.
I hope this receives more attention soon I've been prototyping with Orleans on various projects and the lack of client -> silo encryption means Orleans becomes a weak spot in the security posture of any project's stack.
I wanted to chime in here and provide some extraneous use cases that might help prioritise this work. I might be able to help with some of the work, depending on the specifics. Currently these are targeted use cases that I'm actively working on leveraging Orleans for, but wouldn't potentially see production for about a year.
This use case is where an Orleans cluster will be responsible for pipelining highly regulated data from source X to destination Y. The data would range from PII/PCI/HIPAA data to highly classified data for US Agencies. In this case, Orleans would need a comprehensive transport security interface so you could secure silos through TLS or other protocols. This helps keep silo and grain communications in line with various data regulations and allows the cluster to be certified for those data workloads.
This use case is a little more complicated, where you would have silos as part of the same cluster, but sit on differently regulated networks. In this case, a Green network would be a secure, internal network where all traffic is considered trusted as the network these silos reside in. A Blue network would be a mostly secure but external network, and grain calls into this network are guaranteed to be up for inspection through either firewalls or generalised packet captures from external malicious actors. The silos deployed in the Blue network would be able to make grain calls into the Green network, but as such would still be subject to all the same security concerns as Blue -> Green traffic is still external -> internal, same with Green -> Blue, as it would be internal -> external network transitions. Comprehensive TLS security would be very important to make sure traffic is secured behind encryption.
Those are the two that come to mind, but I can likely think of others where Orleans would need more comprehensive transport security.
I'm getting back to this issue.
General question for interested parties (@ReubenBond, @davidfowl, @jason-bragg, @sergeybykov) for socket refactoring:
Do we want to see Pipe's rolled in and passed around in stead of sockets? Or a more generic abstraction, as originally done above on that poc branch, where there's an ITransport thing and implementers can do whatever?
I've got around 64 hours of "time" slate for this work so I'm hoping I can work with you guys to come up with something everyone is happy with.
I'm actively working on this at the moment. The current implementation uses pipes and is based on Bedrock APIs for the most part, so ConnectionContext is the core type.
My branch is here: https://github.com/ReubenBond/orleans/tree/feature/networking-replat
It will change significantly in the next week while I stabilize it & your input is appreciated. Currently I've worked only on the core networking and have not put thought into an actual TLS implementation other than it would be based upon an SslStream adapted to work with those Pipes via a middleware. If you'd like to flesh it out more on top of the existing code in the branch then that would be very helpful.
@ReubenBond How best can I help aside from digging into a TLS middleware impl?
@ReubenBond not sure I can help with the implementation, but I'm happy to be a guinea pig (as time permits) for end-user experience.
@ReubenBond quite a few implementations in the code base currently have hard dependencies to System.Net or even System.Net.Socket assemblies. For example SiloAddress class that exposes a property of type IPEndPoint. But what if I want to deploy an alternative messaging transport that does not implement TCP Sockets? It would be great when we can use named endpoints for example where a unique string can be used to address a Silo. It traverses all the way up the chain to EndpointOptions where also hard ties to System.Net.Socket assembly.
Are you planning to tackle this as well for this branche?
@tedvanderveen This would be a nice refactoring. Maybe worth entertaining together with https://github.com/dotnet/orleans/issues/1121 and https://github.com/dotnet/orleans/issues/3049 as both consider changes to addressing but grains and their state instead of siloes.
As a tangential to perhaps stimulate minds, I wrote in those about making use of IPFS addressing scheme, so maybe in this case one could see a silo as a IPFS directory and make systems such as https://www.sandbox.game/ a bit more integrated way. I.e. run something fast in Orleans, but make it easy to conceptually talk about addressing state in other clusters hosted by others too and reconciliate things over another concensus. I think @sergeybykov has just appropriately deckared going to GDC too. :)
Sounds interesting to register some grains on public secured ports, while other are not secured and exposed only in internal network. That will allow natural way to grow our game over the globe until we get resources to plug HTTPS over each such kind of endpoint.
Other alternative would be to have one cluster, but part of grains could listen on other no secured port so that these could be exposed by k8s/istio/envoy as secured to public. Again, I would like to avoid to have 2 clusters.
Is there any existing mechanism I can hook into to server different ports for different grains?
The major reason why you shouldn't put Orleans to a public network is mostly because it doesn't have any protection against usual attacks like buffer overflow etc. Which is different if you look to public-facing servers like Kestrel for example. Also, the messaging between silos and with clients, has no security validation whatsoever, so one untrusted source can just connect to your silo gateway port, and send packages that may see valid for Orleans protocol-wise, but can have malicious code on it. It doesn't matter how you add transport security to it. Yes, you can put a firewall/proxy to avoid DDoS and others, but you still vulnerable to poison messages.
Now that #5436 is merger, soon you will be able to have TLS encryption on it.
The only way I see public gateway could be open, is with the following:
If that would be possible, then we would win a lot. Because Orleans would have a whole new bred of interoperability with other technologies onboarding...
Perhaps it is a discussion for another issue...
cc: @ReubenBond @sergeybykov @jason-bragg
@galvesribeiro Maybe the Kestrel bits cover the options you mention, thus opening a Kestrel in cluster. I remember @blowdart mentioning it'll be replacing IIS (maybe at https://coder-coacher.github.io/NDC-Conferences/Security-in-ASPNET-Core-20-Barry-Dorrans-TfmBOzuPFLQ.html or somewhere else).
Makes me personally nervous, though. One should note about the the increased attack surface on data in the cluster and in memory that easily enjoys legal protection, in addition to plain overloading a cluster. Personally I would be happier to have network isolation, proxies etc. to offload (and absorb) some attacks and keep breathing room if something happens. Also streaming/sending data without transformations to cluster so it doesn't need to be decoded and encoded (this also add an attack surface that could, or should, be mitigated).
<Edit: To be clear, something like this could perhaps get some internal LOB apps off the ground, some interesting cases to evaluate could be gateways in sensor fields and maybe Cloudflare Workers.
Even more thinking is that should one use TLS when running Orleans in Azure. Is it about trusing one's cloud provider? Though mandated in some enterprise data centers to encrypt all traffic.
As for now seems we will try to open our Orleans for streaming-observer clients via https://shadowsocks.org/en/spec/Protocol.html .
The major reason why you shouldn't put Orleans to a public network is mostly because it doesn't have any protection against usual attacks like buffer overflow etc. Which is different if you look to public-facing servers like Kestrel for example.
Just a quick note to point out that the ASP.NET Core team frequently states that Kestrel is not a public-facing server implementation.
This isn't just a "public-facing" question -- lack of TLS just brought my fledgling enterprise-usage scenarios to a screeching halt. Even on our internal networks it's mandated for absolutely everything, no exceptions. FinTech is like that. :)
Does the 3.0 milestone mean this is actually planned for 3.0? (Is there a roadmap somewhere?)
It's not just fintech but banking in general, also healthcare and with gdpr almost everything. As a side note Kestrel is now allow to be public facing (aka edge server) it was only early versions that had no hardening
https://docs.microsoft.com/en-us/aspnet/core/fundamentals/servers/kestrel?view=aspnetcore-2.2
This has been discussed elsewhere already (e.g. at https://gitter.im/dotnet/orleans?at=56fed826d9b73e635f68704a), but to add more weigh into this consideration, it can be also manufacturing execution systems (MES) that are often air-gapped in relation to Internet and could make use Orleans like systems. I mean those systems will not be run in cloud but are distributed. Various infrastructure systems are also such systems, they don't even need to be critical infrastructure. It's not possible to monitor all places where there is traffic against eavesdropping, tampering etc., so all sorts of mitigations are added and even mandated, TLS being one of them.
If I may, link to https://github.com/dotnet/orleans/issues/1524 (at https://github.com/dotnet/orleans/issues/1524#issuecomment-193002012), I added something to consider towards the end) since @rrector makes a very good point in that thread about timing related problems that can be catastrophic in the just mentioned cases. One another is "hidden buffers", whatever they may be then such as just queuing up uncontrollably something to threading context .
But then also, when running TLS one may run out of entropy. That is a real problem too, discussed at https://gitter.im/dotnet/orleans?at=55de1300b0c2ec8705e72a6e. If tracking Gitter discussions is troublesome, https://dev.solita.fi/2015/11/11/raiders-of-the-lost-entropy.html is a good post on the issue too.
@Drawaes Tim thanks for pointing that out about Kestrel, I missed that change. Very interesting.
Does the 3.0 milestone mean this is actually planned for 3.0?
Yes it is planned for 3.0
Most helpful comment
Yes it is planned for 3.0