contracts = Index('contracts')
my_analyzer = analyzer('simple')
contracts.analyzer(my_analyzer)
@contracts.doc_type
class ContractDocument(DocType):
client = fields.StringField(attr='client_name')
class Meta:
model = Contract
fields = [
'id',
'name'
]
I am trying to apply simple analyzer on fields. But when I call termvectors after running search_index, I see that standard analyzer is applied on fields.
How can I apply simple analyzer to all fields?
I have tons of fields, I don't want to declare them by hand. Only solution is to create an ES Field for each model field? How can I declare analyzer for fields in Meta.fields or how can I modify my class to do this?
You can discard the Django related parts.
I guess this kind of analyzer settings does not do this:
PUT /contracts
{
"mappings": {
"contract_document":{
"properties": {
"name":{
"type":"text",
"analyzer": "simple"
}
}
}
}
}
Right?
You can define an analyzer for a field by passing it as a parameter to the field class:
name = Text(analyzer='simple')
unfortunately I am not familiar with the API that you are using, but just passing a name of a built-in analyzer or an Analyzer instance should work.
Adding Analyzers as parameters has no effect. Please see this example tested with Elastic 5.5 and 5.6 and elasticsearch-dsl 5.3.0
from elasticsearch_dsl import DocType, analyzer
from elasticsearch_dsl.field import Text
class MyIndex(DocType):
class Meta:
index = 'myindex'
doc_type = 'MyIndex'
town = Text(analyzer="keyword")
index = MyIndex()
index.town = "Port Washington"
index.save()
Then when you run a search it's still using the standard index
curl -XGET 'localhost:9200/myindex/_analyze?pretty' -H 'Content-Type: application/json' -d'
{
"field": "town",
"text": "Port Washington"
}
{
"tokens" : [
{
"token" : "port",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "washington",
"start_offset" : 5,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 1
}
]
}
And how are you creating the index in elasticsearch? From your example it looks like you are not explicitly pushing the mappings into elasticsearch. try adding:
from elasticsearch_dsl import Index
i = Index('myindex')
i.doc_type(MyIndex)
i.create()
before actually creating any documents.
Thanks!
I'm not it's done automatically by the DocType class upon save. You can check by running
>>> from pprint import pprint
>>> from elasticsearch_dsl import Index
>>> i = Index('myindex')
>>> pprint(i.get())
This outputs
{'myindex': {'aliases': {},
'mappings': {'MyIndex': {'properties': {'town': {'fields': {'keyword': {'ignore_above': 256,
'type': 'keyword'}},
'type': 'text'}}}},
'settings': {'index': {'creation_date': '1509116977845',
'number_of_replicas': '1',
'number_of_shards': '5',
'provided_name': 'myindex',
'uuid': 'yozAN9l4SQ2wHR8JovfMmA',
'version': {'created': '5060399'}}}}}
If I run the create manually it throws an Exception as the index already exists
PUT http://127.0.0.1:9200/myindex [status:400 request:0.020s]
Traceback (most recent call last):
File "<input>", line 2, in <module>
File "/home/steve/...venv/.../lib/python3.6/site-packages/elasticsearch_dsl/index.py", line 179, in create
self.connection.indices.create(index=self._name, body=self.to_dict(), **kwargs)
File "/home/steve/..../venv/.../lib/python3.6/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped
return func(*args, params=params, **kwargs)
File "/home/steve/.../venv/.../lib/python3.6/site-packages/elasticsearch/client/indices.py", line 107, in create
params=params, body=body)
File "/home/steve/.../venv/.../lib/python3.6/site-packages/elasticsearch/transport.py", line 312, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/home/steve/.../venv/.../lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 128, in perform_request
self._raise_error(response.status, raw_data)
File "/home/steve/.../venv/.../lib/python3.6/site-packages/elasticsearch/connection/base.py", line 125, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, 'index_already_exists_exception', 'index [myindex/CKCg3ATSTn6sRAvI1ly2uA] already exists
Update* - Of course it exists as I had already created documents
If you don't create the index explicitly an empty index with default mappings (== using standard analyzer) will be created for you. That is why you have to create the index explicitly before pushing any documents in. See (0) for more details.
0 - http://elasticsearch-dsl.readthedocs.io/en/latest/persistence.html#document-life-cycle
Ah yes of course, thank you :)