Elasticsearch-dsl-py: Create custom analyzer filter for a index in py-elasticsearch-dsl

Created on 25 Apr 2018  Â·  4Comments  Â·  Source: elastic/elasticsearch-dsl-py

i'm working with py-elasticsearch-dsl for my master, i'm creating a index of title documents in a corpus of turkish titles, and i need implements a custom lowercase analyzer for turkish language.

I'm trying do it with this:

    turkish = analysis.token_filter('turkish_lowercase', type="lowercase", language="turkish")


    turkish_lowercase = analyzer('turkish_lowercase',
        type = "custom",
        tokenizer="standard",
        filter=["turkish_lowercase"],
    )

    class Document(DocType):
        # title = Text()
        query = Percolator(
            analyzer=turkish_lowercase,
            filter=turkish
        )    # query is a percolator

        class Meta:
            index = 'titles' # index name
            doc_type = '_doc'

        def save(self, **kwargs):
            return super(Document, self).save(**kwargs)

But i'm getting this error:

    python percolator.py                                        1 ↵  1736  17:37:54 
    PUT http://localhost:9200/title-index [status:400 request:0.004s]
    Traceback (most recent call last):
      File "percolator.py", line 55, in <module>
        Document.init()
      File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch_dsl/document.py", line 161, in init
        cls._doc_type.init(index, using)
      File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch_dsl/document.py", line 85, in init
        self.mapping.save(index or self.index, using=using or self.using)
      File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch_dsl/mapping.py", line 116, in save
        return index.save()
      File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch_dsl/index.py", line 219, in save
        return self.create()
      File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch_dsl/index.py", line 203, in create
        self.connection.indices.create(index=self._name, body=self.to_dict(), **kwargs)
      File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch/client/utils.py", line 76, in _wrapped
        return func(*args, params=params, **kwargs)
      File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch/client/indices.py", line 91, in create
        params=params, body=body)
      File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch/transport.py", line 314, in perform_request
        status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
      File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 163, in perform_request
        self._raise_error(response.status, raw_data)
      File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch/connection/base.py", line 125, in _raise_error
        raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
    elasticsearch.exceptions.RequestError: TransportError(400, 'illegal_argument_exception', 'Custom Analyzer [turkish_lowercase] failed to find filter under name [turkish_lowercase]')

So, what is the correct way to do this?

Thank you

Most helpful comment

you need to specify the filter=[turkish] when defining the analyzer:

turkish = analysis.token_filter('turkish_lowercase', type="lowercase", language="turkish")

turkish_lowercase = analyzer('turkish_lowercase',
        type = "custom",
        tokenizer="standard",
        filter=[turkish],
 )

then your original code should work

All 4 comments

For custom filters please pass them directly into the filter list, not the name, so: filter=[turkish] in your call to analyzer. That way the filter analysis will be included when setting up the index.

Hope this helps!

@HonzaKral

elasticsearch.exceptions.RequestError: TransportError(400, 'mapper_parsing_exception', 'Mapping definition for [query] has unsupported parameters:  [filter : turkish_lowercase] [analyzer : turkish_lowercase]')

@HonzaKral it seems like percolator can not support that parameters, why?

I tested it with the example and i have this:

elasticsearch.exceptions.RequestError: TransportError(400, 'mapper_parsing_exception', 'Mapping definition for [query] has unsupported parameters:  [analyzer : html_strip]')

Why?

you need to specify the filter=[turkish] when defining the analyzer:

turkish = analysis.token_filter('turkish_lowercase', type="lowercase", language="turkish")

turkish_lowercase = analyzer('turkish_lowercase',
        type = "custom",
        tokenizer="standard",
        filter=[turkish],
 )

then your original code should work

Yes, i found my error: i was putting that parameters in the percolator, i had have to use in the text field. Thank you!

Was this page helpful?
0 / 5 - 0 ratings