Jaeger: Allow secure communication between components

Created on 6 Oct 2017 · 25Comments · Source: jaegertracing/jaeger

Update 2019-09-20: replaced by #1718

Document and/or implement secure communication channels[1] between components, like:

[x] Tracer -> Collector
[x] Agent -> Collector: #1310
[x] Collector -> data store (Cassandra, Elasticsearch)
[x] Query -> data store (Cassandra, Elasticsearch)
[ ] User -> UI
[ ] UI -> Query
[ ] Tracer -> Agent

This is related to #404 .

1 - TLS for HTTP, but not sure how it would work with thrift

security

Source

jpkrohling

👍3

Most helpful comment

FYI my use case:

we have various k8s clusters in various locations (datacenters) in the US (though in the future likely in other continents) and using various cloud providers
in each k8s cluster we have various services generating spans and complete traces (currently a single trace always comes from 1 location only, we don't have requests that span across multiple locations, though that may come later)
we want 1 central jaeger deployment in our 'ops' cluster, it has a multi-node cassandra cluster in that cluster, the cluster does not extend into the other locations across the internet. everything (cassandra etc) is contained within that one location
i honestly don't care much whether to run the collector centrally in the ops cluster, or run collectors in each location and then those collectors talk to the central ops cassandra over the internet (though that does sound a bit weird). given that collectors can be scaled linearly (at least that's what it looks like) it seems better to run them centrally, and then have agents in each location talk over the internet to the collectors (that's also what people seem to be recommending)
obviously we don't want people to be able to sniff our jaeger traffic, because they contain confidential information, and we don't want anyone be able to send crap into our environment, so we need encryption+authentication.
this prompted me to enquire with my infra folks about a vpn/secure tunnel between the locations and the central ops cluster. they informed me of various limitations ("some of our cloud providers do not provide a layer2 network, so it is not possible to add custom routes", "in some locations the nodes/pods in each cluster cant connect out"), so they're asking instead for an application level solution such as https with auth.
i'm not sure if going from tchannel to http(s) is a big performance downgrade, a secure encrypted/authenticated tchannel would also for me I suppose.
simple solutions are good solutions, maybe it's just a matter of terminating ssl and basic auth via a kubernetes ingress

Dieterbe on 3 Nov 2017

👍3

All 25 comments

Current state:

Jaeger uses other libraries to handle the communication with remote components, like storage access (Cassandra / Elasticsearch). Its HTTP endpoints are not encrypted, which might/should be solved by using a reverse proxy in front of the component to be protected. The tracer components are able to send data via HTTPS to a remote collector server. The Agent is not able yet to send data to the collector using a secure communication channel.

jpkrohling on 6 Oct 2017

FYI my use case:

we have various k8s clusters in various locations (datacenters) in the US (though in the future likely in other continents) and using various cloud providers
in each k8s cluster we have various services generating spans and complete traces (currently a single trace always comes from 1 location only, we don't have requests that span across multiple locations, though that may come later)
we want 1 central jaeger deployment in our 'ops' cluster, it has a multi-node cassandra cluster in that cluster, the cluster does not extend into the other locations across the internet. everything (cassandra etc) is contained within that one location
i honestly don't care much whether to run the collector centrally in the ops cluster, or run collectors in each location and then those collectors talk to the central ops cassandra over the internet (though that does sound a bit weird). given that collectors can be scaled linearly (at least that's what it looks like) it seems better to run them centrally, and then have agents in each location talk over the internet to the collectors (that's also what people seem to be recommending)
obviously we don't want people to be able to sniff our jaeger traffic, because they contain confidential information, and we don't want anyone be able to send crap into our environment, so we need encryption+authentication.
this prompted me to enquire with my infra folks about a vpn/secure tunnel between the locations and the central ops cluster. they informed me of various limitations ("some of our cloud providers do not provide a layer2 network, so it is not possible to add custom routes", "in some locations the nodes/pods in each cluster cant connect out"), so they're asking instead for an application level solution such as https with auth.
i'm not sure if going from tchannel to http(s) is a big performance downgrade, a secure encrypted/authenticated tchannel would also for me I suppose.
simple solutions are good solutions, maybe it's just a matter of terminating ssl and basic auth via a kubernetes ingress

Dieterbe on 3 Nov 2017

👍3

Agent to collector path is using tchannel for legacy reasons. I would much rather use grpc, which will have standard support for https.

yurishkuro on 3 Nov 2017

Hi @Dieterbe we had much the same use case with a of variation:

we're deliberately not including confidential info (PII, customer data) in our traces: we want the traces to be accessible to all the teams and not have to try to do masking on a per-span basis!

So what we've done is deploy cassandra and the query centrally, and then put an agent on every node via a daemonset (to avoid the per-pod overheads of sidecars), and a collector ha pair for the whole k8s cluster, then used TLS client certs to secure the collector -> cassandra traffic, and the user -> query traffic.

We had to improve some bits of Jaeger to permit this, but I think they have all been merged now, though I haven't fully verified the dependency job change in prod for us (soon though).

We aren't worried about sniffing of agent -> collector traffic w/in our k8s clusters, and the rest is secured (or localhost only).

rbtcollins on 11 Jan 2018

Cf #773 for gRPC work.

One question I have about using HTTPS is what's the accepted practice for certificates? Are we ok to use some internally generated certificate for the servers in the collectors? If someone has a link to a blog post discussing this it would be appreciated.

yurishkuro on 5 May 2018

@yurishkuro I'm using ES operator (https://github.com/upmc-enterprises/elasticsearch-operator) to manage ES clusters on Kubernetes. The operator can set up Kibana and Cerebro at same time while enabling secured communication over HTTPS.

They are using an opaque secret to store differents files related to certs:

Name:         es-certs-elasticsearch-cluster
Namespace:    logging
Labels:       <none>
Annotations:  <none>

Type:  Opaque

Data
====
kibana.pem:         1631 bytes
cerebro-key.pem:    1675 bytes
kibana-key.pem:     1679 bytes
cerebro.pem:        1631 bytes
node-key.pem:       1679 bytes
node-keystore.jks:  3506 bytes
node.pem:           1631 bytes
truststore.jks:     1032 bytes
ca-key.pem:         1679 bytes
ca.pem:             1367 bytes

Then they are mounting a volume at /elasticsearch/config/certs that Kibana/Cerebro can use.

I'm not sure you were expecting this kind of information, or if it's the best but that's a possible way to secure Jaeger <-> ESCluster 😃

sneko on 6 May 2018

One question I have about using HTTPS is what's the accepted practice for certificates?

IMO, it's sufficient for us to just add a couple of configuration options:

what cert to offer to clients
what the key is to decrypt the content the client is sending
which CA cert to use when trusting server certs

Platforms like OpenShift and Kubernetes are able to generate certs on demand via an internal CA, as well as rotate the certs/keys based on certain rules. This is not the kind of knowledge we want within our code.

jpkrohling on 7 May 2018

👍1

@yurishkuro The key behavioural decisions deployers will be making are:

what CA to issue client certs from
- often becomes a custom trust root on the server side
what CA to issue server certs from (they can be different)
- if a private CA will be a trust root on the clients

This translates into the config options that @jpkrohling mentioned, though that list is incomplete.

The full set for a single direction of authentication is:

public side of cert
cert to present (aka private key)
trust chain for verification of certs (sometimes delivered in the public side of the server cert, but logically separate)
trust root (to be installed on the component verifying the creds. Note that for client certs this is typically supplied explicitly, even if it is a sub-CA, because you don't want any cert from the root CA to be considered a valid client.

So there are up to 8 unique config values in the most complex case of having two private CA's.

rbtcollins on 7 May 2018

The full set for a single direction of authentication is:

We are talking about different things here. You are probably talking about Mutual TLS authentication, whereas I'm talking about only encrypting the communication channel.

I still believe that auth should be handled at the infra layer. Mutual TLS Auth fits this scenario and can be easily accomplished by tools like Istio. At most, we should allow clients to send auth data (basic HTTP auth, bearer tokens), but that's it.

To allow secure communications, on the other hand, all we need to do is pass the cert data to the underlying handler, so, there's minimal code on our side.

jpkrohling on 8 May 2018

👍1

@jpkrohling If its just the channel that needs encrypting, OE can be used without any certificate authority at all: I believe you're really talking about authenticating the well known endpoint and encrypting the channel, otherwise no CA would be involved in the discussion.

There's minimal code on our side for handling client certificates as well: its really quite straight forward. I think that we should either say 'deploy all our components behind a service-mesh or similar layer, running only on localhost and using an outbound proxy', or support things fully. Doing half-a-TLS support is worse than none IMO because it leads folk into a setup that cannot grow with them.

rbtcollins on 2 Jun 2018

If its just the channel that needs encrypting, OE can be used without any certificate authority at all: I believe you're really talking about authenticating the well known endpoint and encrypting the channel, otherwise no CA would be involved in the discussion.

The Certificate Authority is to tell the client side of the communication that the cert being offered by the server is to be trusted. Otherwise, there could be a man in the middle intercepting the traffic. It's particularly relevant if the server certificate was generated by an internal CA like Kubernetes' Service CA.

(I think I should know what OE is about, but I'm currently having a blank...)

There's minimal code on our side for handling client certificates as well: its really quite straight forward

If we are just delegating CLI options to the underlying library, I'm all for it. But it should not be a feature of Jaeger.

Doing half-a-TLS support is worse than none IMO because it leads folk into a setup that cannot grow with them

Client Auth Cert is quite different and significantly more complex than just encrypting a pipe using TLS. I don't think we should mix this issue with auth at all.

jpkrohling on 11 Jun 2018

If we are just delegating CLI options to the underlying library, I'm all for it. But it should not be a feature of Jaeger.

I mean something like what is being requested by #678

jpkrohling on 11 Jun 2018

Looking at this too, for an initial small deployment on servers in Scaleway.

It seems like Jaeger Query doesn't (at present) have any support for clients wanting to access it via HTTPS/TLS.

That part should be fairly straight forward to implement, as (in the simplest case) it's just a slightly different Go library call. http.ListenAndServeTLS() instead of http.ListenAndServe()

The TLS version of the call just needs a certificate file and key file supplied.

For our use case, they'd be generated by LetsEncrypt. The cert and key files would be passed via command line, or config file argument. Something like:

--query.certificate-file string Path to the TLS certificate file
--query.certificate-key-file string Path to the key file for the TLS certificate

Does that sound reasonable? :smile:

justinclift on 2 Jan 2019

We have precedent for TLS for storage, so should be using consistent flag names, e.g.

      --cassandra.tls                                   Enable TLS
      --cassandra.tls.ca string                         Path to TLS CA file
      --cassandra.tls.cert string                       Path to TLS certificate file
      --cassandra.tls.key string                        Path to TLS key file
      --cassandra.tls.server-name string                Override the TLS server name
      --cassandra.tls.verify-host                       Enable (or disable) host key verification (default true)

      --es.tls                                     Enable TLS
      --es.tls.ca string                           Path to TLS CA file
      --es.tls.cert string                         Path to TLS certificate file
      --es.tls.key string                          Path to TLS key file

yurishkuro on 2 Jan 2019

Ahhh. So more like this?

--query.tls.cert string   Path to TLS certificate file
--query.tks.key string    Path to TLS key file

justinclift on 2 Jan 2019

Hmmm, it should be possible to provide a query.tls.ca option as well, but I'd have to look into it more. Pretty sure it just means the TLS setup needs to be done a bit differently first, but that's from dodgy memory and it's been ages since I wrote TLS specific handling code. :man_shrugging:

justinclift on 2 Jan 2019

An HTTP server typically sets only a cert (chain) and a key. The cert chain would include the CA that was used to sign the server's own cert and all upstream CAs.

jpkrohling on 15 Jan 2019

TLS option is good for collector's http as well.

Use case:
I am trying to report from AWS lambda which is usually running outside of AWS VPC, which requires collector to listen to the internet request. To keep bearer secret, it would be nice to have TLS connection on tracer->collector communication.

iori-yja on 28 Feb 2019

👍1

To keep bearer secret, it would be nice to have TLS connection on tracer->collector communication.

On the backend side, a reverse proxy could be used for this purpose. On the client side, the env var JAEGER_ENDPOINT can be used with some clients, where an HTTPS URL would be specified.

jpkrohling on 2 Apr 2019

With the inclusion of gRPC between the agent and the collector, I think this item is complete, missing only an official documentation about securing the UI/Query and about the communication between the client and agent.

jpkrohling on 2 Apr 2019

The existing gRPC TLS code doesn't support authenticating the clients. In TLS terms, the normal thing to do is allow the clients to present a key/cert, and have the server verify that against a CA

I've taken the liberty of putting together a PR, https://github.com/jaegertracing/jaeger/pull/1591