Elasticsearch-dsl-py: Analyzer is not being applied to fields?

Created on 5 Oct 2017  路  6Comments  路  Source: elastic/elasticsearch-dsl-py

contracts = Index('contracts')
my_analyzer = analyzer('simple')

contracts.analyzer(my_analyzer)


@contracts.doc_type
class ContractDocument(DocType):
    client = fields.StringField(attr='client_name')

    class Meta:
        model = Contract

        fields = [
            'id',
            'name'
        ]

I am trying to apply simple analyzer on fields. But when I call termvectors after running search_index, I see that standard analyzer is applied on fields.

How can I apply simple analyzer to all fields?

I have tons of fields, I don't want to declare them by hand. Only solution is to create an ES Field for each model field? How can I declare analyzer for fields in Meta.fields or how can I modify my class to do this?

You can discard the Django related parts.

I guess this kind of analyzer settings does not do this:

PUT /contracts
{
  "mappings": {
    "contract_document":{
      "properties": {
        "name":{
          "type":"text",
          "analyzer": "simple"
        }
      }
    }
  }
}

Right?

All 6 comments

You can define an analyzer for a field by passing it as a parameter to the field class:
name = Text(analyzer='simple')

unfortunately I am not familiar with the API that you are using, but just passing a name of a built-in analyzer or an Analyzer instance should work.

Adding Analyzers as parameters has no effect. Please see this example tested with Elastic 5.5 and 5.6 and elasticsearch-dsl 5.3.0

from elasticsearch_dsl import DocType, analyzer
from elasticsearch_dsl.field import Text

class MyIndex(DocType):
    class Meta:
        index = 'myindex'
        doc_type = 'MyIndex'
    town = Text(analyzer="keyword")

index = MyIndex()
index.town = "Port Washington"
index.save()

Then when you run a search it's still using the standard index

curl -XGET 'localhost:9200/myindex/_analyze?pretty' -H 'Content-Type: application/json' -d'
{
  "field": "town",
  "text": "Port Washington"
}
{
  "tokens" : [
    {
      "token" : "port",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "washington",
      "start_offset" : 5,
      "end_offset" : 15,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

And how are you creating the index in elasticsearch? From your example it looks like you are not explicitly pushing the mappings into elasticsearch. try adding:

from elasticsearch_dsl import Index
i = Index('myindex')
i.doc_type(MyIndex)
i.create()

before actually creating any documents.

Thanks!

I'm not it's done automatically by the DocType class upon save. You can check by running

>>>  from pprint import pprint
>>> from elasticsearch_dsl import Index
>>> i = Index('myindex')
>>> pprint(i.get())

This outputs

{'myindex': {'aliases': {},
               'mappings': {'MyIndex': {'properties': {'town': {'fields': {'keyword': {'ignore_above': 256,
                                                                                         'type': 'keyword'}},
                                                                  'type': 'text'}}}},
               'settings': {'index': {'creation_date': '1509116977845',
                                      'number_of_replicas': '1',
                                      'number_of_shards': '5',
                                      'provided_name': 'myindex',
                                      'uuid': 'yozAN9l4SQ2wHR8JovfMmA',
                                      'version': {'created': '5060399'}}}}}

If I run the create manually it throws an Exception as the index already exists

PUT http://127.0.0.1:9200/myindex [status:400 request:0.020s]
Traceback (most recent call last):
  File "<input>", line 2, in <module>
  File "/home/steve/...venv/.../lib/python3.6/site-packages/elasticsearch_dsl/index.py", line 179, in create
    self.connection.indices.create(index=self._name, body=self.to_dict(), **kwargs)
  File "/home/steve/..../venv/.../lib/python3.6/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/home/steve/.../venv/.../lib/python3.6/site-packages/elasticsearch/client/indices.py", line 107, in create
    params=params, body=body)
  File "/home/steve/.../venv/.../lib/python3.6/site-packages/elasticsearch/transport.py", line 312, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/home/steve/.../venv/.../lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 128, in perform_request
    self._raise_error(response.status, raw_data)
  File "/home/steve/.../venv/.../lib/python3.6/site-packages/elasticsearch/connection/base.py", line 125, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, 'index_already_exists_exception', 'index [myindex/CKCg3ATSTn6sRAvI1ly2uA] already exists

Update* - Of course it exists as I had already created documents

If you don't create the index explicitly an empty index with default mappings (== using standard analyzer) will be created for you. That is why you have to create the index explicitly before pushing any documents in. See (0) for more details.

0 - http://elasticsearch-dsl.readthedocs.io/en/latest/persistence.html#document-life-cycle

Ah yes of course, thank you :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

njoannin picture njoannin  路  3Comments

MauriJHN picture MauriJHN  路  4Comments

takaomag picture takaomag  路  3Comments

mortada picture mortada  路  3Comments

quasiben picture quasiben  路  4Comments