EDIT Discussion moved to: https://github.com/elastic/logstash/issues/9521
@EDIT This proposal has been revised 2018-03-26. I'm just editing in place to spare new people to this issue.
This is a part of https://github.com/elastic/logstash/issues/9215 , with a specific focus not on the programmatic plugin interface, but on how we manage artifacts and dependencies, as well as class loading and other isolation concerns, as well as compatibility guarantees.
Individual jar files
Does this mean that each plugin will need to shade their dependencies, or is there some other means to be used dependencies ?
(assuming yes to shading)
There are some security concerns around shading ... Anytime you shade, you break the signatures of dependent the jars, make vulnerabilities more difficult to detect (a jar in jar essentially hides the inner jar, or code scans look your code is bad), and can result in a difficult to license a JAR.
For example, I don't think you can even shade JCE providers like Bouncy Castle : https://stackoverflow.com/questions/13721579/jce-cannot-authenticate-the-provider-bc-in-java-swing-application
Since there will be a custom class loader anyway, can we use a .zip format ? and load all of the dependencies from that extract by the class loader ? With most projects it usually just as easy to create a jar + zip as it is a shaded jar.
For some reason I had assumed so far that we'd keep the same ruby gems distribution mechanism, just have it package JARs and any extra files needed in the gem. Is that even a possibility?
Using rubygems for the distribution of Java plugins is a possibility, but undesirable for a number of reasons. The first that come to mind for me (perhaps other people have other reasons) are:
@jakelandis that's a good point re: zip files. I think it probably does make sense to use separate zip files instead of jar files. We'd unzip each plugin to its own dir and point the classloader at that directory.
@tsg that's a fair point. I think the main thing is that if we want to provide a positive experience for java developers it makes sense to stay within the java ecosystem. Rubygems.org is very much outside that system :).
Whether it's better or not than rubygems.org is important to consider, but secondary to that I think. I can definitely see the "Publish to rubygems" step confusing java developers.
Playing the devil's advocate: We could have the plugin generator (I assume there will be something like that) create the right gradle tasks so that the plugin author just needs to do gradle publish and we abstract the whole gems thing. I know that eventually they will have to troubleshoot ruby gems and probably hate us.
I'm not going to fight for it, but I wanted to have the rubygems option on the table because maintaining two sets of distribution mechanisms sounds like a lot of maintenance work.
@tsg that's a good point. I too do not relish the idea of us having multiple systems. The plugin manager would have to be patched to work with both. Currently rubygems.org is the namespace for plugins. We'd have to figure out a way to distinguish between the two.
One other thought. I do wonder, at some point would rubygems.org be upset by us publishing non-ruby code to it?
If we DO decide to have two systems, then that's an argument toward having a central hub website we'd maintain that is system agnostic for the source of truth. Of course, this blows up the scope quite a bit.
Would also ask that we keep in-mind offline environments (see these bunch of ERs). Maven would work since many/most offline customers I've encountered have, or could have, a private artifactory/nexus repo to rehost. Failing that, we should continue to have a simple way to download and sneaker-net plugins.
@mbarretta yes. I agree that's important. Since these will just be fat jars the offline story should be much simpler than it is today with rubygems.
I second @tsg concerns here. I don't see how the packaging & publishing part of the Java plugin will significantly break the Java developers experience, especially if these are wrapped in Gradle tasks. I also think that maintaining 2 sets of plugins mechanism will be extra maintenance work.
Our current plugin infrastructure is not perfect we know it. I also hear you @mbarretta and we did talk about that in Tahoe and found a potential solution for the offline installation - I believe we should have this discussion separately.
The thing here is that we do have an (imperfect) infrastructure that works today and could support pure Java plugins too. Adding a new packaging and distribution mechanism seems risky for a few reasons:
These are a few concerns that come to mind. Some or all are certainly technically resolvable but I would strongly suggest we look deeper in the cost/benefit analysis for moving forward with this proposal.
I also want to point out that this strategy seems to contradict the recent downscoping of the Java API to help move forward faster, this is of course based on my assumption that nailing a dual systems plugins infrastructure will probably be hard and complex.
A suggestion would be to list all moving parts related to the plugin infrastructure and for each describe the strategy for handling a dual system and list the potential problems we can anticipate. Maybe this could help understand the scope of what we are looking at to support such a dual system?
@colinsurprenant I think making that list is a good idea. Are you opposed to us spending a week or so spiking this concept? Having had some informal conversations around the topic the scope doesn't seem too bad to me. Why don't we try it out? We can always stop work if it looks too bad.
Also, WRT to the de-scoping question, we de-scoped one portion of the project that was guaranteed to be high effort after we spent some time spiking it. This is a different area. We should make the same type of decision but only after taking a look.
I think the scope of work will become clear once the problem has been analyzed/unravelled and an implementation plan is drawn from that? This is what I would do to have a better grasp of what need to be done, but that may not be necessary for other who have been thinking about that already and are ready to jump in POC mode. I think the reason we are having this discussion is that this scope has not really been shared so it is really hard to judge from this side.
Yes, let's work on that list Monday. I agree with you that anyone who wants to work on a PoC shouldn't feel blocked however.
For an experienced Java developer, the biggest obstacle to the Ruby world is not the programming language, but the toolchain, frameworks, conventions, idioms, idiosyncrasies, and common failure modes of the ecosystem. It's fairly easy to figure out the basics of the programming language, but a Java dev will spend a lot more time dealing with the inevitable error conditions and special cases in the Ruby tooling and environment that are more difficult to resolve even if we provide Gradle invocations for that tooling. If we're creating a Java-native option for our users, it's time well-spent to create one that utilizes an all-Java toolchain that's friendlier to Java devs and doesn't require mixing Java and Ruby conventions. From some of the earlier discussions on Maven plugin distribution, some of the simplifications that would allow would also lower the expected effort for the Java plugin distribution infrastructure, so I expect that many of the scope-related concerns will become moot.
I totally get the rationale for aiming at a java-only tooling & packaging and I agree it would be the ideal scenario if we can make it work correctly. OTOH from a practical standpoint I also believe that using the Rubygems packaging would be an acceptable alternative and would not actually deter any Java dev from contributing plugins, given documentation & guidelines. I also think it is worth thinking about the dual-system added complexity, development & maintenance cost versus the developers experience cost of an imperfect Java experience.
Here are a few items on top of my head that should be looked into:
How would a Java plugin depends on another Ruby plugin, like a default codec for example? How would such a dependency be defined and resolved? For example, looking at current "hybrid" plugins: the kafka input depends on both the json and plain codecs as runtime dependencies. Another example the data filter depends on the generator input, the json codec and the null output (as development dependencies).
How would any Java/Ruby cross dependency work when installing or updating a plugin using the plugin manager?
@colinsurprenant Could be referenced by name. Codecs.get("plain"). No need to have the actual classes.
WRT cross dependencies, we can finally clean that all up by packaging all deps into the Jar and using a dedicated class loader. Can you walk me through some scenarios where this approach would not work? As a reminder, if someone truly needed to do that sort of thing that could do the undocumented Ruby/Java kludge we currently do, but that should be an extreme exception.
@andrewvc not sure I am following. Do you mean that Ruby plugin dependencies would be packaged into a Java plugin jar?
Maybe I am missing something but I'm am not sure I see how the mechanics will work for actually resolving cross-dependencies (sorry if this is obvious or not an issue/resolved on your end)
Assuming we have a way to specify somewhere that your Java-only plugin depends on some Ruby-only plugin/codec. (currently this is encoded in the Ruby plugin gemspec with something like
add_runtime_dependency 'logstash-codec-json'
1- Dependencies need to be resolved so that when we package LS it fetches and install all right versions of all dependencies and without conflicts. Currently this is resolved with Bundler: bundler goes through the Gemfile and resolves the dependencies and then install all required gems. How would Bundler have visibility into the Java plugins possible Ruby dependencies?
2- Similar issue as (1) but when using the plugin manager to install or update a Java plugin.
@colinsurprenant Java plugins would only be able to depend on other Java plugins. They would be able to set any plugin as a default codec of course. Is there a use case where the plugin would actually need to instantiate a JSON codec specifically? In most cases where we specify it as a runtime dep, like the ES input or the kafka output its superfluous.
Is there a reason we need that feature? AFAIK it's an anti-pattern for a Logstash plugin to depend on a specific feature of a codec or messing with a specific codec subclass. The codec should be configured outside the plugin.
@colinsurprenant One other thing, it feels like we're moving past the rubygems.org discussion rather quickly. @danhermann 's point above comes from the perspective of an actual Java developer. It's been +1'd by @jakelandis @robbavey who are much closer to the target audience of this feature.
IMHO, Their experience and opinion is more relevant than mine or yours, since we have been deep inside the Ruby world for years. Of course rubygems.org is easier for me to wrap my head around than Maven. I've used it nearly daily for over a decade (which is scary to say).
Doesn't it make sense to more heavily weight the opinions of people who understand that experience on a personal level?
IMO this conversation has sidetracked and I'd like to refocus on its current proposal, which has three bullet points about packaging, the api, and how to achieve dependency isolation for the java plugins. From my POV, I have no issue with any of these ideas.
Can we create a new issue with a proposal on how, from a UX standpoint, a java plugin will be managed and how that will interact with the current ruby plugin ecosystem?
Seems like there are a few things to weigh up here:
Personally, I think having a completely native, Java-idiomatic development environment would make plugin development a lot more welcoming to experienced Java developers - @danhermann makes some great points around needing to learn the Ruby ecosystem. If we can do that, without overly impacting the other two areas, then that feels like a win.
All right then - seems like the holistic view of the dual-system is well under grasp and that most cases have been looked into. Happy to help where I can.
I know that this discussion is nearing its end but I wanted to weigh in with my opinion on the notion of "completely native, Java-idiomatic development environment".
Most frameworks evolve towards perfection with initial releases requiring more under the hood knowledge than later releases. Early adopters are required to understand more about the underlying ecosystem than late adopters. Frameworks like Rails, IntelliJ come to mind. I'm sure that Gradle as it has evolved has become more seamless and easier to use over time.
Even though Logstash is not a v1 framework, Java plugins are - so I think its not unreasonable to require some Ruby hoop jumping (wrapped as Gradle tasks) for early adopters as long as we clearly communicate our plans to steadily remove these hoops over the near term.
Along the continuum of "what we have now" to "the completely native, Java-idiomatic development environment" we can and should discuss where do we release v1 and what steps do we take to move along the continuum - especially if it seems that some of us differ on where that point should be.
@guyboertje I think the experiments we've done indicate that it would be less work to go the native approach. If we go the ruby way we have to write a bunch of weird tooling to paper over it for Java devs. It's not clear that's less work than just going the Java route from day one. Additionally, the ruby wrapper doesn't move us incrementally toward Java native. They are parallel tracks.
Wouldn鈥檛 it make sense to use the Java 9 module system (from Jigsaw, JSR 376) here? Also, for plugin loading the ServiceLoader facility could be useful.
@praseodym yes! That'd be awesome. However, we currently support JDK1.8 only. We'd like to support Java 10, but we're mostly waiting on JRuby support. It just recently got Java9 support. Java10 support should come soon however, in the next JRuby patch release.
I'm not very familiar with the specifics of JPMS. What specific advantages would they bring to Logstash plugins? I know that you can do complex things with module dependencies, but I think one of our goals is to minimize cross-plugin dependencies.
I think the biggest advantage would be that it saves a lot of effort of coming up with solutions that try to solve the problems that JPMS has already solved. Given that JPMS is built into the JDK it also feels like the right thing to do.
One advantage that JPMS offers that there are no longer version conflicts when two plugins (modules) use different versions of the same dependency (e.g. like in the beats and tcp inputs recently). However, this does impose some requirements on how these dependency modules are named and versioned.
Since Logstash will have to support Java 9/10 sooner rather than later, I think that鈥檚 not a blocker for considering using JPMS here.
@praseodym that's a great point, that it would be standardized.
However, it would mean that we wouldn't be able to use it until Logstash 7.0.0, since we have to support Logstash 6.x on JDK1.8.
I think that if that weren't the case this would be a relatively easy decision.
WRT plugins using the same dependency, that shouldn't be a problem with the proposed solution here since we'd use parent last classloaders.
Aside from the nice standardization, is the isolation at all improved vs. using a parent-last classloader?
Discussion moved to https://github.com/elastic/logstash/issues/9521
Most helpful comment
For an experienced Java developer, the biggest obstacle to the Ruby world is not the programming language, but the toolchain, frameworks, conventions, idioms, idiosyncrasies, and common failure modes of the ecosystem. It's fairly easy to figure out the basics of the programming language, but a Java dev will spend a lot more time dealing with the inevitable error conditions and special cases in the Ruby tooling and environment that are more difficult to resolve even if we provide Gradle invocations for that tooling. If we're creating a Java-native option for our users, it's time well-spent to create one that utilizes an all-Java toolchain that's friendlier to Java devs and doesn't require mixing Java and Ruby conventions. From some of the earlier discussions on Maven plugin distribution, some of the simplifications that would allow would also lower the expected effort for the Java plugin distribution infrastructure, so I expect that many of the scope-related concerns will become moot.