Elasticsearch: Publish analysis-common artifact so we can use it in IT tests

Created on 25 Nov 2017  路  7Comments  路  Source: elastic/elasticsearch

I'm trying to test a plugin I'm building.

I'm using Maven for this plugin.
I extended ESIntegTestCase class but sadly it starts without analysis-common module:

[2017-11-25T12:36:47,259][INFO ][o.e.n.Node               ] [node_s0] initializing ...
[2017-11-25T12:36:47,373][INFO ][o.e.e.NodeEnvironment    ] [node_s0] using [1] data paths, mounts [[/ (/dev/disk1s1)]], net usable_space [22.1gb], net total_space [465.7gb], types [apfs]
[2017-11-25T12:36:47,374][INFO ][o.e.e.NodeEnvironment    ] [node_s0] heap size [3.5gb], compressed ordinary object pointers [true]
[2017-11-25T12:36:47,377][INFO ][o.e.n.Node               ] [node_s0] node name [node_s0], node ID [84Pux-9MTW670nXUXicXWw]
[2017-11-25T12:36:47,378][INFO ][o.e.n.Node               ] [node_s0] version[6.0.0], pid[82221], build[8f0685b/2017-11-10T18:41:22.859Z], OS[Mac OS X/10.13.1/x86_64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_121/25.121-b13]
[2017-11-25T12:36:47,378][INFO ][o.e.n.Node               ] [node_s0] JVM arguments [-ea, -Didea.no.launcher=true, -Des.set.netty.runtime.available.processors=false, -Didea.test.cyclic.buffer.size=1048576, -Dfile.encoding=UTF-8]
[2017-11-25T12:36:47,381][INFO ][o.e.p.PluginsService     ] [node_s0] no modules loaded
[2017-11-25T12:36:47,381][INFO ][o.e.p.PluginsService     ] [node_s0] loaded plugin [org.elasticsearch.index.MockEngineFactoryPlugin]
[2017-11-25T12:36:47,381][INFO ][o.e.p.PluginsService     ] [node_s0] loaded plugin [org.elasticsearch.ingest.bano.IngestBanoPlugin]
[2017-11-25T12:36:47,381][INFO ][o.e.p.PluginsService     ] [node_s0] loaded plugin [org.elasticsearch.node.NodeMocksPlugin]
[2017-11-25T12:36:47,381][INFO ][o.e.p.PluginsService     ] [node_s0] loaded plugin [org.elasticsearch.search.MockSearchService$TestPlugin]
[2017-11-25T12:36:47,381][INFO ][o.e.p.PluginsService     ] [node_s0] loaded plugin [org.elasticsearch.test.ESIntegTestCase$TestSeedPlugin]
[2017-11-25T12:36:47,381][INFO ][o.e.p.PluginsService     ] [node_s0] loaded plugin [org.elasticsearch.test.discovery.TestZenDiscovery$TestPlugin]
[2017-11-25T12:36:47,381][INFO ][o.e.p.PluginsService     ] [node_s0] loaded plugin [org.elasticsearch.test.store.MockFSIndexStore$TestPlugin]
[2017-11-25T12:36:47,381][INFO ][o.e.p.PluginsService     ] [node_s0] loaded plugin [org.elasticsearch.transport.MockTcpTransportPlugin]
[2017-11-25T12:36:47,381][INFO ][o.e.p.PluginsService     ] [node_s0] loaded plugin [org.elasticsearch.transport.Netty4Plugin]
[2017-11-25T12:36:47,417][INFO ][o.e.d.DiscoveryModule    ] [node_s0] using discovery type [test-zen]
[2017-11-25T12:36:47,462][INFO ][o.e.n.Node               ] [node_s0] initialized
[2017-11-25T12:36:47,463][INFO ][o.e.n.Node               ] [node_s0] starting ...

Which means that I can not register the following custom analyzer in my tests because asciifolding is not available:

        return Settings.builder()
                .put("index.analysis.analyzer.bano_analyzer.type", "custom")
                .put("index.analysis.analyzer.bano_analyzer.tokenizer", "standard")
                .putArray("index.analysis.analyzer.bano_analyzer.filter", "lowercase", "asciifolding")
                .build();

Is it possible to publish this module so we can add it in our tests?

:DeliverBuild Delivery discuss

Most helpful comment

I wish you published all modules for this exact reason. I run into issues due to missing painless module all the time and am sure I will run into it when I port plugins to 6x because of analysis-common module as well.

All 7 comments

I don鈥檛 think we should do this. We publish only the modules that expose a client API because we have to yet for the rest we want to encourage standing up a standalone Elasticsearch node instead of internal cluster integration tests.

I wish you published all modules for this exact reason. I run into issues due to missing painless module all the time and am sure I will run into it when I port plugins to 6x because of analysis-common module as well.

This is kind of annoying since we need the analysis-common for our integration test and we don't have these dependencies published. We ended up using the artifacts published under the group-id org.codelibs.elasticsearch.module. Thanks to https://github.com/codelibs/elasticsearch-module/ for that. Hope you publish all modules so we don't have to rely on others.

I had to copy the analysis-common module to my dev kit https://github.com/jprante/elasticsearch-xbib-devkit by changing the group ID just for the reason to publish a jar artifact to a repo so that tests with custom analyzers that use token filters like keyword marker can run.

For example:

import org.elasticsearch.analysis.common.CommonAnalysisPlugin;
...
ESTestCase.TestAnalysis analysis = ESTestCase.createTestAnalysis(new Index("test", "_na_"),
                settings, new AnalysisDecompoundPlugin(Settings.EMPTY), new CommonAnalysisPlugin());

https://github.com/jprante/elasticsearch-analysis-decompound/blob/6.2/src/test/java/org/xbib/elasticsearch/index/analysis/decompound/patricia/DecompoundTokenFilterTests.java#L59

My copy in my repo looks like this:
http://xbib.org/repository/org/xbib/elasticsearch/elasticsearch-analysis-common/6.2.2.0/

The result is that I have to track the mainline analysis-common module for updates and copy all changes manually into my code. The analysis-common module has virtually become invisible for analyzer plugin developers. For me it's fun to build a personal code base, but for other plugin developers it will be no easy job.

We discussed this during Fix-it-Friday on 2018-02-23 and agreed that we are not going to publish these module artifacts for the following reasons:

  • we do not want these artifacts to be seen as general libraries that developers can rely on in their code
  • we do not want to encourage the use of internal cluster integration tests, instead we want to encourage the use of standalone tests where this module would be installed

@jasontedor could you elaborate on how we are supposed to write tests, testing the functionality of these filters when you say 'use of standalone tests where this module would be installed'?

Let's say I want to test the creation of an index using a custom analyzer or pattern filter, adding a document and querying using said filter.

Should I always have a standalone ES cluster when running these tests? Surely that would limit the portability of my code severely. What would be the best practice here?

@cdekker I shared some ideas about integration testing in this thread: https://discuss.elastic.co/t/in-memory-testing-with-resthighlevelclient/106196/6

In case it helps.

Was this page helpful?
0 / 5 - 0 ratings