Jaeger: How to support plugins

Created on 22 Sep 2017  路  51Comments  路  Source: jaegertracing/jaeger

We continue to be asked if we can "support X as storage backend" (e.g. #331, #421). Provided that the authors are willing to contribute, maintain, and support such backend implementations, we still have an open question of whether we want to accept those contributions into the main jaeger repository. I could be wrong, by my initial reaction is that we should keep them in separate "contrib" style repositories, for the following reasons:

  1. having half a dozen implementations is going to bloat the size of the binary, increase compile / testing time
  2. having them in core repo suggests that they are officially supported, same as Cassandra/ES, but we don't have expertise in all those different storage solutions, and cannot be on the hook to support them

If we do keep them in the contrib repos, however, we need an approach to allow end users to use those implementations without having to rebuild the backend from source.

One such approach is using Go plugins (https://golang.org/pkg/plugin/), for example as done in Kanali. I think it is feasible to package plugins as individual containers and mount them into a shared volume where the main Jaeger binaries can locate and load them.

cc @pavolloffay @black-adder - any thoughts?

Update 12/28/2017

An alternative approach mentioned in the comments below is the sidecar plugin model (e.g. https://github.com/hashicorp/go-plugin) where the plugin runs as a separate process and the main binary communicates with it via RPC, e.g. gRPC streams. It's worth noting, however, that this approach is still a special case of in-process plugin model, so we need to start there and answer the questions below. For each type of plugin we can support a built-in "sidecar" variant.

Update 8/2/2018

Per https://github.com/jaegertracing/jaeger/issues/422#issuecomment-410129850, I think the following is a reasonable, actionable, and realistic plan. If someone wants to volunteer, ping me.

  • [ ] define protobuf version of SpanWriter and SpanReader interfaces
  • [ ] implement gRPC client and server, where server delegates to the respective storage.SpanReader/Writer interfaces, and client implements them.
  • [ ] extend storage factory with two new types, e.g. h-plugin (h for harshicorp) and g-plugin (plain gRPC). The h-plugin should support a cli flag for the name of the plugin executable. The g-plugin should support a cli flag for grpc server host:port.
  • [ ] implement in-memory storage as g-plugin using gRPC client/server defined above
  • [ ] implement one of the other storage types (Cassandra or ES) as h-plugin as a template.
  • [ ] TBD: how to pass configuration to the h-plugins. Because plugins are plain executables, they can use viper just like the main binaries, and the cli flag with the plugin command line might be a long string (or the user can pass params via env vars). We probably should provide a template for main, so that the actual main for a plugin is very short.
  • [ ] update documentation with example of building an h-plugin.
  • [ ] replace Cassandra with in-memory shared service in crossdock integration test.

Update 9/4/2018

Someone pointed out that Go's pkg/plugin now supports MacOS and Linux. This removes a significant development hurdle with using the native plugins, and makes it a viable option which is probably simpler to implement than the gRPC-based harshicorp model.

enhancement help wanted roadmap

Most helpful comment

@etsangsplk we discussed it last Friday and I think the consensus is to move forward with https://github.com/hashicorp/go-plugin, with InfluxDB being the first implementation (#272). No promises about the timeline, but we are closer to getting it done now than we were one week ago :-)

All 51 comments

I think Kubernetes is also planning on supporting plugins for external storage. I'm not familiar with how it works, but it would perhaps be worth checking if it would make sense for us to follow the same path.

I think K8s plugins will be runnable containers, i.e. integration is out of process. A lot of our use cases require in-process integration for performance reasons, e.g. span sanitizers, filters.

I believe out of process _integration_ (effectively middleware/brokers) would be practical for storage, of course, for things like sanitizers, filters, et al. that is a different story.

What I found developing Kanali is that plugins are amazing for allowing users to add their own custom functionality. For example, if people design plugins for Kanali, the only requirement is that they implement a certain interface. One side affect is that the .so files aren't small and so while the main program's binary remains small, the Docker image is larger.

I think I agree with @omeid that storage should be out of process. For things like these, I'd propose using gRPC to communicate with out of process integration. For in-process items, such as encrypting tag values, +1 for golang plugins.

I've been thinking this past week about how to support plugins in Jaeger. Some considerations:

  • the plugins themselves must be configurable, e.g. Cassandra storage has a bunch of parameters

    • configuring via env variables is the easiest

    • configuring via cmd line arguments (as we do now) is desired (see #608)

  • there must be different types of plugins, e.g. storage vs. span adjusters
  • there should be an easy story of deploying plugins without rebuilding docker images

    • the easiest way seems to be to point the main binary to a directory that stores plugins in sub-directories. The main binary will iterate and load all of them. Each plugin could be self-describing, e.g. by implementing one of the supported plugin interfaces like StoragePlugin, AdjusterPlugin, etc.

    • I don't have a good strategy yet how plugins can be loaded from other docker images, open to suggestions.

Addressing your last point, when [email protected] is released, I think i'm going to do the following to load plugin. With this approach, the kanali pod does not have to be changed when new plugins are required or changed.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: kanali
  namespace: default
spec:
  template:
    metadata:
      labels:
        app: kanali
    spec:
      serviceAccountName: kanali
      containers:
      - name: kanali
        image: northwesternmutual/kanali
        command: ["/kanali", "start"]
        volumeMounts:
        - name: plugins
          mountPath: /plugins
          readOnly: true
      initContainers:
      - name: kanali-plugin-apikey
        image: fbgrecojr/kanali-plugin-apikey:local
        imagePullPolicy: IfNotPresent
        command: ["cp", "/apiKey_v2.0.0-rc.1.so", "/plugins/"]
        volumeMounts:
        - name: plugins
          mountPath: /plugins
      volumes:
      - name: plugins
        emptyDir: {}

Addressing your first point, making these plugins configurable is tricky. ENV variables will work but I haven't found a solution for dynamically changing the cli interface. The problems lies in the fact that a golang plugin's init function is called when they are opened. This of course would happened after the init functions for the code doing the opening is executed.

`command: ["cp", "/apiKey_v2.0.0-rc.1.so", "/plugins/"]

I was thinking of something very similar, thanks for posting!

Re dynamic CLI, I have a proposal in #608.

Food for thought - hashicorp terraform also does pluggable backends. AFAIR, they use protobuf for intra-process communication. Might be worth having a look at how they do it.

@yurishkuro Ran into a nasty issue relating to plugins with Kanali that I thought I'd share.

In theory, plugins should be able to be dynamically loaded without the need to modify Jaeger's container image. In the context of Kubernetes, the recommended approach would be to use init containers. The init container would move the compiled plugin into the location where the Jaeger binary is configured to look for them.

_However_, there is a nasty bug that prevents this from being the case until at least Go v1.11. Evidence of this can be tracked in the following issues:

The root cause of the issue stems from the fact that unless a plugin is compiled using the exact same vendor/ instance as Kanali, bad things will happen. For example, global variables will be duplicated, init functions will execute twice, etc.

The current workaround that I use entails building your custom plugin within the same Dockerfile as Kanali. The goal is to take the superset of dependencies that Kanali and every plugin requires and build them all using this aggregated dependency tree. This will only work if all intersecting dependencies utilize the exact same revision.

Thanks for pointing those out, @frankgreco. It seems like a complete show stopper to me until there's a better fix from Go. Even if the plugin is build with the same vendor as Jaeger core, there's going to be tight coupling between Jaeger version and plugin build, since the next Jaeger version might have upgrades to the deps. I actually want to give the hashicorp gRPC based plugin framework a try.

@yurishkuro I agree! I'm going to checkout their gRPC framework as well.

@yurishkuro Curious if there's been any new developments here as I am interested in working on a storage plugin. Is harshicorp gRPC still being evaluated?

Now that the Protobuf model has been merged, the development of gRPC-based plugins can go ahead without blocking on the rest of #773. We have not evaluated harshicorp plugins system, however. I am not sure what it provides that's not achieved by plain gRPC.

I just had a look at harshicorp/go-plugin. If I understand it correctly, even though it uses gRPC (can also use net/rpc), it spawns the plugin as a child process and uses Unix domain socket as the comms channel. This sounds good for implementing plugins for real storage. It somewhat derails another plan I had to build in-memory storage as a shared gRPC service, which would be useful for our crossdock integration tests (instead of always spawning Cassandra), but it won't work with hashicorp/go-plugin since we need a single shared storage process, not two separate child processes for collector and query. However, hashicorp/go-plugin requires us to define a gRPC service and implement client and server anyway, so maybe we can reuse those.

I will update the top of the ticket with what I think is an actionable and realistic plan.

Hi @yurishkuro, is it possible that your in-memory storage issue could be worked around with some atomic handling of plugin start? It is possible to define a grpc server with a custom listen address, go-plugin seems to support reattaching: https://github.com/hashicorp/go-plugin/blob/master/client.go#L117.

If plugins were responsible for their own configuration, they could be given an explicit listen address for the grpc server, if they were not able to bind, they would simply exit.

What do you think?

We at https://github.com/Klarrio use Jaeger for some of the interesting bits on our platform. We are interested in having storage plugins implemented and we would happily contribute.

I don't think lock files are relevant here. If I understand correctly, @yurishkuro means that Hashicorp plugins are always spawned as a subprocess, so cannot be shared among multiple services. Collector and query need to access the same database in order for the tests to succeed.

However, I don't see any problem with implementing a storage plugin interface in plain grpc without Hashicorp plugins. That should resolve everything.

@isaachier one could write this from scratch but Hashicorp plugins support reattaching, the only inconvenience is that there is a plugin instance starting and stopping immediately due to not being able to bind. Why writing everything from scratch?

@radekg if Hashicorp uses subprocesses exclusively, how can they be shared across docker containers, where the separate services are viewed as if they are running on separate machines?

Hashicorp plugin framework is perfectly fine for running 3rd party code in Jaeger, e.g. if someone wants to support a new type of storage like DynamoDB which we don't want to support explicitly (and compile) in the main code base. But the issue with in-memory storage is that it needs to run as a standalone process so that collector & query service can both access it. That service would need an external/remote API, and Hashicorp plugin framework offers no help here. However, since Hashicorp plugin framework is based on gRPC, we can reuse the same gRPC service definition / binding.

FWIW, this issue is no longer blocked, if someone wants to take a stab at the storage layer IDL in protobuf (using the already defined Span model), then it would be possible to implement a plugin.

Ah, so shared in-memory storage is actually running in a separate container? So how is that different than, say, shared redis?

Mostly it's different because it's data model aware and can answer the query service API calls (in a naive way by scanning through all traces). Doing the same with redis would require retrieving all data into the query service first.

@yurishkuro Sure, but from the connectivity point of view, it's no different than any other storage backend. One needs to know the address to connect to and the backend has to be up. I'm not sure what's preventing the use of https://github.com/hashicorp/go-plugin.

I am happy to be proven wrong, but afaik harshicorp go-plugin does not give you "the address to connect to", instead you give it a binary and it runs the binary as a child process and communicates to it via unix socket, which is private to the parent process, i.e. cannot be shared with other processes.

Similar discussion in OpenCensus about Go plugins: https://github.com/census-instrumentation/opencensus-service/pull/70

@yurishkuro

I started working on a plugin framework that leverages go-plugin. Could you confirm that the following interfaces are the ones that need to be implemented by the plugin provider? I'm new to jaeger's codebase but I have been working with grpc and protobuf for the last 3 years.

dependencystore/Writer
dependencystore/Reader
samplingstore/Store
spanstore/Writer
spanstore/Reader

Thanks

@olivierboucher yes, they are, however, for the proof of concept, I would suggest only focusing on the spanstore reader and writer, as they are the most stable/mature/important. The sampling store is the least important (even master does not implement it for anything but Cassandra, and the adaptive sampling code that uses it is not fully oss yet).

Quick update on the progress I made so far:

  • I cannot re-use the protos from model/model.proto since they use gogoproto and this would only allow the creation of go plugins. I created a new set of plain protos that can be mapped to jaeger's models. Taking this route will allow plugins to be written in any language supported by grpc.

  • I'm currently in the process of mapping the new proto types to the existing models

  • Plugin written in go will be first class citizens since they will be able to import a shared package containing the plugin's interface

I will be committing before writing tests so that you can approve the direction I took

I cannot re-use the protos from model/model.proto since they use gogoproto and this would only allow the creation of go plugins

@olivierboucher the gogoproto annotations in model/model.proto are optional, however they do create issues when compiling for other languages since you need to have those definitions downloaded. See #1213.

@yurishkuro thanks for the insight. I think it's best I go on with the separate protos for the POC and we can figure something out after to gain efficiency.

I think it would also be best to have a different set of protos for the plugins since any change to the internal protos would most likely break plugins. Having a different set allows us to adapt the client if breaking changes happen. Again, protos are very flexible so I don't know if that is a concern yet.

We also need to think about the distribution of the protos, having a separate one could allow us make it public under jaegertracing/storage-plugin-proto. Plugin maintainers would simply have it as a submodule and it would be free of any internal protos that there might be in the current files.

The plan is to eventually move protos to https://github.com/jaegertracing/jaeger-idl, but we'll need to figure out a few issues like #1213 before we do that.

Understood. I just wrapped up the POC. Ran integration tests just fine with the sampling strategy test commented out.

Here are steps I followed:

  • Install the memory-grpc-plugin go install github.com/jaegertracing/jaeger/cmd/memory-grpc-plugin
  • Start the all-in-one instance with our new storage type SPAN_STORAGE_TYPE=grpc-plugin go run -tags ui ./cmd/all-in-one/main.go --grpc-plugin.binary=memory-grpc-plugin
  • Comment out L61 getSamplingStrategy(t) in cmd/all-in-one/all_in_one_test.go
  • Run the integration test make integration-test

What are the next steps? Here's what's on my TODO list so far:

  • Figure out how to pass the configuration to the plugins
  • Write tests for the plugin framework
  • Benchmark memory-grpc-plugin vs the actual memory backend

Can you create a WIP PR, to see the code changes?

I think a benchmark would be interesting to see, to validate the viability of this approach. Then we can think about configuration options. What you have now actually looks good to me. I was originally thinking of having something more general, similar to @ledor473 's built-in plugins (#1050), which allows loading multiple plugins at once, for different purposes. But we can do that later, storage plugin is the most important one to unblock.

Is there any prior art about passing configuration to the plugin binaries?

btw, just re-read this thread, and remembered that memory storage is not the best candidate for harshicorp plugin, since the sub-process cannot be shared by other Jaeger components (https://github.com/jaegertracing/jaeger/issues/422#issuecomment-416800910). But I think it's fine as a test, especially when using all-in-one that does not have the sharing problem.

Here you go #1214

I will look at the options available for passing configuration.

great, I will make sure to re-read the whole thread as well as #1050

@olivierboucher cool, hashicorp plugins appear to work!

Curious alternative approach (https://github.com/mholt/caddy/wiki): adding the plugins to the source code and building a custom binary, i.e. does not involve distributing pre-built binaries for core server + plugins.

adding the plugins to the source code and building a custom binary

A company that needs a custom plugin would then have to build Jaeger from the sources themselves?

Yes, that's why I don't like that solution. They actually have a website where you can select the plugins in the UI and it builds the binary for you and allows to download it, which is ever more odd for automatic deployment & raises security concerns.

It sounds cool, but I probably wouldn't want to use it as a customer, nor support it as a vendor :-)

Indeed, hence I labeled it curious rather that useful.

@yurishkuro It would be tremendous help if this feature be finished and merged, to let end user to plugin their own storage without pulling down the binary for other backends; and easier to add additional backends easily without tying to to this repo. Do you have a time line for this feature?

@etsangsplk we discussed it last Friday and I think the consensus is to move forward with https://github.com/hashicorp/go-plugin, with InfluxDB being the first implementation (#272). No promises about the timeline, but we are closer to getting it done now than we were one week ago :-)

Hello, we'd (github.com/exaring) like to provide plugin support for storage backend with Go's pkg/plugin support. Later on we'd like to build a storage plugin for DynamoDB. What is the current state of the discussion/progress here?

The problem with Go's pkg/plugin approach is that it (currently) requires your plugins to be built against a very specific revision/tag of Jaeger. Whenever you update your Jaeger instance, you'll need to recompile your plugins, or you risk introducing some bugs that are nasty and hard to detect/diagnose.

@jpkrohling Is there is weekly meeting group for jaegertracing backend (or a page that record the meeting notes)?

@yurishkuro @jpkrohling
I can't make the next two project meetings. Can you share the status of this issue? I'm not aware of a branch where external plugin work is happening, would love to poke around if something exists.

@jacobmarble we usually don't use feature branches. The work is happening on some PRs linked to this issue, e.g. #1323, #1214.

1461 is merged 馃帀 馃帀 馃帀 馃巿 馃巿 馃巿

Many thanks to @olivierboucher and @chvck 馃憦 馃憦 馃憦

Remaining task: add documentation #1518.

@yurishkuro can we close this one?

Was this page helpful?
0 / 5 - 0 ratings