Elasticsearch: Remove support for types?

Created on 22 Dec 2015  Â·  75Comments  Â·  Source: elastic/elasticsearch

The ability to have several types on the same index is causing problems:

  • the mappings APIs need to maintain one mapping per type, yet those mappings can't be independent and keeping them synchronized is complicated (see eg. discussions on #15539)
  • it gives the feeling that the system can easily deal with documents that have very different mappings in the same index, which is not true. This is why in 2.0 we added more restrictions on mappings across types. In addition types encourage sparsity and sparse fields cause Lucene to either be slow when there is a special impl for the sparse case (eg. doc values) or use tremendous memory and disk space in spite of the fact that few documents have a value (eg. norms, because a fixed amount of memory is used for every doc, regardless of whether they have a value for this field or not).

Migrating existing users is certainly going to be complicated but this would also make the system more honest to new users about the fact that we can't do index-level multi-tenancy efficiently. Also I suspect that the restrictions that we added in 2.0 (that eg. two fields that have the same name in different types) already made lots of users migrate to a single index per data type instead of folding them into different types of the same index.

See also https://www.elastic.co/blog/index-vs-type.

:SearcMapping Meta

Most helpful comment

Hi,

Please could it be documented in the ES documentation that the long term plan is to move away from multiple types per index, so that people starting new projects now know to avoid them, and for people who have projects that already use multiple mappings, to start going through the seven stages of loss sooner rather than later.

I only stumbled across this issue by chance when reading up on indexes vs types via google.

Obviously, apologies in advance if this has already been documented and I've just missed it.

cheers
Dan

All 75 comments

Seems like removing support for types would be blocked on https://github.com/elastic/elasticsearch/issues/11432.

It'd be lovely if this accelerated parent-child relations across indexes though, that'd get rid of a lot of the aforementioned sparsity.

What about divorcing mappings from types? Make mappings an index level feature and types just kind of like part of the id? Would that make them light enough to not get in the way?

@zygfryd indeed

@nik9000 That is an option too. At least it would make clear that there is a single mapping per index and we could stimplify the internals. With this option, I guess types would remain as first-class filters only (eg. we could do index sorting on _type so that filtering on them would be faster)?

types would remain as first-class filters only

I'd be fine with that.

I think divorcing mappings from types is a good idea. In my opinion, removing types is too radical.

In our use case (logs management) we have 80+ types of logs and we use a (daily, weekly or monthly) index per project/tenant to be able to handle the load.
We do have a different mappings for each type of logs but mappings are identical across all indices.

In addition types encourage sparsity and sparse fields cause Lucene to either be slow when there is a special impl for the sparse case (eg. doc values) or use tremendous memory and disk space in spite of the fact that few documents have a value (eg. norms, because a fixed amount of memory is used for every doc, regardless of whether they have a value for this field or not).

If I read you correctly, we should instead use one index per log type ?
In our use case we have different needs for each project/tenant. Some projects can log as much as 5K logs/sec while others projects only log 10 logs/sec.
So we need to be able to configure replicas/shards and index creation frequency (daily, weekly, monthly...) per project/tenant.

Could you please explain a little bit more your proposal with this use case and with types removed ?

Make mappings an index level feature

:+1:

If I read you correctly, we should instead use one index per log type ?

Yes. Types are trappy: at first sight they look like an efficient way to have multiple tenants in a single index, but in practice this usually makes things _worse_ than having multiple indices due to the fact that Lucene likes dense data better than sparse data, especially for norms and doc values.

If some tenants have lower indexing rates, they would get fewer shards and/or longer time frames (eg. weekly indices instead of daily).

I first thought divorcing types from mappings would be a good compromise, but types have another issue that they force us to fold the type into the uid, which typically either makes the _uid less efficient (slower indexing and slower gets) if we prepend the type (like today) or more space-intensive if we append the type. So I think we should think about getting rid of types entirely. For instance, maybe we could consider enforcing a single type per index in version X, with APIs still working with either index/type or just index, and then removing types entirely in version X+1?

I think we should deprecate type in 5.0 and start moving towards index level mappings, uuid per index not per type etc. If somebody really needs the type in the UUID they can still do that I guess. Types can be build on top of es without native support, there is nothing today that prevents you from doing this. It rather complicates things on all end internally without real benefit to the outsite except of the first level API support that someone might find useful but is only syntactic sugar with a potential high price to pay. I am also +1 to remove this in 6.0 entirely and guide folks how to do it correctly.

The main concern I have for now is the support of the parent/child feature.
Today we need to have colocated data within the same shard for parent and children in order to perform joins in memory.

Removing types will only allow to do parent/child using the same "kind" of document.

Not super terrible as at the end of the day this is what is happening behind the scene.

So if we had:

PUT index/company/1
{
  "name": "elastic"
}
PUT index/employee/11?parent=1
{
  "first_name": "david"
}

We will basically have to rewrite this as for example:

PUT index/1
{
  "company": {
    "name": "elastic"
  }
}
PUT index/11?parent=1
{
  "employee": {
    "name": "david"
  }
}

It means that parent/child will be able to do self referencing as proposed here in #11432.
So as soon as we have support for #11432, I'm totally +1 to remove types.

BTW, may be we should already start to educate people to use only one type per index and use data structures similar to what I proposed in my example?

What about the following plan:

  • require 5.0+ indices to have at most one type (which means parent/child does not work)
  • add APIs without a type parameter, eg. POST index/ to index with an auto-generated id or PUT index/id to index with an explicit id, remove the requirement to have the type name as a top-level key in mappings, etc. (all this does not need to done in 5.0, this could come in 5.x)
  • remove types from APIs in 6.0+
  • separately work on another way to expose joins in a reasonable way, as a replacement for parent/child (assuming we want a replacement)

If we are not ready to drop parent/child right now, one trade-off I could consent would be to have a setting that allows indices to have multiple types so that parent/child can be used, but these indices could not be upgraded to 6.0.

For the record, we have some evidence that removing types could help indexing speed quite significantly since we would not have to fold the type name into the uid: https://github.com/elastic/elasticsearch/issues/18154#issuecomment-237851085

Thoughts?

@jpountz I think we should do this, but it seems your proposal has gone unnoticed given the lack of reaction (positive or negative). Can we get some other thoughts on this?

/cc @s1monw @clintongormley

@rjernst i'm staring at it as you type :)

We discussed this in Fix it Friday.

Where we want to get to:

We want to remove the concept of types from Elasticsearch, while still supporting parent/child.

  • Field mappings will be at the top level, instead of under type names
  • The _uid will consist only of the _id, not type#id.
  • Creating a document with ID : PUT index/_doc/id
  • Creating a document with autogenerated ID : POST index/_doc

It's very important to me that we don't leave users behind - we need to give them a smooth upgrade and transition path.

Proposed path:

In 5.0:

  • [ ] In new indices, enforce that fields with the same name in different types must have identical mappings.
  • [x] POST index should no longer create an index. #20001
  • [x] Check other REST endpoints for similar clashes. #20055

In 5.x:

  • [x] Add support for setting enabled:false to the _type field, which will exclude type from the UID and use a fixed type name for (eg) mapping requests/responses. These indices will not support parent/child. #24317
  • [x] Add support for PUT index/_doc/id, POST index/_doc, and PUT|POST index/_mapping
  • [x] Add new mechanism for specifying and supporting parent/child.

In 6.0:

  • [x] Prevent new indices from being created with more than one type.

In 6.7:

  • [x] For APIs whose request/ response structure changes with types removal (create index, get mapping, etc.), add a query string parameter (eg include_type_name) which indicates whether the requests/responses should include a type name. This parameter defaults to true. Issue a deprecation warning for all requests that don't set include_type_name to false

In 7.0:

  • [x] Remove support for old parent/child (#29224)
  • [x] For requests whose structure does not change with types removal, make sure to accept both typed + typeless versions of the API. Issue a deprecation warning for requests that specify types.
  • [x] Default include_type_name to false. Issue a deprecation warning for all requests that include_type_name.
  • [x] Deprecate references to types in the Java high-level rest client

In 8.0

  • [ ] type can no longer be specified in requests
  • [ ] the include_type_name parameter is removed

In 6.0, all existing types from 5.x indices will have identical mappings. We will still have indices with old parent/child implementation. If we can migrate existing parent/child settings to the new settings, then we could move the "return fields at top level" issue into 6.0.

Alternatively, we could return fields at the top level in 6.0 regardless, and still show types (for old indices with types enabled, or with old parent-child) as a separate section in GET mapping.

UPDATED TO REFLECT CHANGES IN https://github.com/elastic/elasticsearch/issues/15613#issuecomment-303727982

UPDATED TO REFLECT CHANGES IN https://github.com/elastic/elasticsearch/issues/15613#issuecomment-382427200

UPDATED TO REFLECT CHANGES IN https://github.com/elastic/elasticsearch/issues/15613#issuecomment-438084242

UPDATED TO REFLECT CHANGES DISCUSSED IN https://github.com/elastic/elasticsearch/issues/35190

Check other REST endpoints for similar clashes.

I think the only other endpoint we need to check is PUT/POST {index}/_mapping, which is free today. Am I missing another one?

indices.exists_type (HEAD {index}/{type}) clashes with a typeless exists (HEAD {index}/{id})

Perhaps the existing form should be deprecated in favour of HEAD {index}/_mapping/{type}

I think that's the lot

In new indices, enforce that fields with the same name in different types must have identical mappings.

Having started to work on it, this is more challenging than I initially thought. However I think this might not be needed: since we will require at most one type in 6.0+ anyway, we will not have to merge mappings across types in 7.0, so this step is not required for the type removal?

@clintongormley I like the purposed plan! _If_ we can find a way to support the new parent/child format in 5.x with the enabled set to false on _type, it would mean simpler migration down the road. We could start to push for setting this setting for new users.

@clintongormley I believe this:

In new indices, enforce that fields with the same name in different types must have identical mappings.

Is resolved (from the 5.0 checkboxes):

PUT /i?pretty
{
  "mappings": {
    "typea": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "standard"
        }
      }
    },
    "typeb": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "english"
        }
      }
    }
  }
}

{
  "error" : {
    "root_cause" : [
      {
        "type" : "mapper_parsing_exception",
        "reason" : "Failed to parse mapping [typea]: Mapper for [title] conflicts with existing mapping in other types:\n[mapper [title] has different [analyzer], mapper [title] is used by multiple types. Set update_all_types to true to update [search_analyzer] across all types., mapper [title] is used by multiple types. Set update_all_types to true to update [search_quote_analyzer] across all types.]"
      }
    ],
    "type" : "mapper_parsing_exception",
    "reason" : "Failed to parse mapping [typea]: Mapper for [title] conflicts with existing mapping in other types:\n[mapper [title] has different [analyzer], mapper [title] is used by multiple types. Set update_all_types to true to update [search_analyzer] across all types., mapper [title] is used by multiple types. Set update_all_types to true to update [search_quote_analyzer] across all types.]",
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "Mapper for [title] conflicts with existing mapping in other types:\n[mapper [title] has different [analyzer], mapper [title] is used by multiple types. Set update_all_types to true to update [search_analyzer] across all types., mapper [title] is used by multiple types. Set update_all_types to true to update [search_quote_analyzer] across all types.]"
    }
  },
  "status" : 400
}

So I _think_ this is no longer blocking for 5.0, is that correct? (I didn't want to check the box if there was something else missing)

As an Elasticsearch user and someone who's spent the last few months working on a type mapping client implementation this sounds good to me. From what I can tell from the conversation so far: types aren't _really_ going away, they're just being shifted up to the index level, so the index determines the shape of its one document type, right?

From what I can tell from the conversation so far: types aren't really going away, they're just being shifted up to the index level, so the index determines the shape of its one document type, right?

That's pretty much correct, once we eventually add support for _type: {enabled: false} (and a user uses it) it will be one type per index. The only trickiness will be parent/child which @martijnvg has been working on in https://github.com/elastic/elasticsearch/issues/20257

So I think this is no longer blocking for 5.0, is that correct? (I didn't want to check the box if there was something else missing)

Not quite. There are certain properties today which can differ, eg:

PUT t
{
  "mappings": {
    "one": {
      "properties": {
        "text": {
          "type": "text",
          "copy_to": "foo"
        }
      }
    },
    "two": {
      "properties": {
        "text": {
          "type": "text",
          "copy_to": "bar"
        }
      }
    }
  }
}

Having started to work on it, this is more challenging than I initially thought. However I think this might not be needed: since we will require at most one type in 6.0+ anyway, we will not have to merge mappings across types in 7.0, so this step is not required for the type removal?

@jpountz makes sense to me. OK, I'll remove that task and remove the blocker label.

I like this change in general, my only concern is the fact that the create index and index document apis will be on the same rest endpoint with only a different method. I am worried this is confusing and error-prone for users.

As discussed above the create index will be only accessible with PUT index and indexing a document with an auto-generated id will be POST index. This seems weird to me as we are affecting different resources purely based on the method used (adding an index to the cluster vs adding a document to the index). We could instead have the auto-generated id endpoint as something like POST index/_auto_id. This should not clash with actual ids as we already have a convention of reserving index/_X for endpoints (e.g. index/_mapping).

Hi,

Please could it be documented in the ES documentation that the long term plan is to move away from multiple types per index, so that people starting new projects now know to avoid them, and for people who have projects that already use multiple mappings, to start going through the seven stages of loss sooner rather than later.

I only stumbled across this issue by chance when reading up on indexes vs types via google.

Obviously, apologies in advance if this has already been documented and I've just missed it.

cheers
Dan

Hi All,

I already posted a comment related to typeless parent/child #20257
Now I would like to post some sort of summary about the Pros and Cons of types. The reason is that the more I think about the prospect of removing types, the more I feel that I would really miss what it buys me: considering the data store in ES as ONE entity (or sort of).

Pros:

  • Types allows me to use an index as some sort of database, each 'table' being a different type. It might be seen as some sort of edge case but a lot of people as using it this way and I would like to say that this is a really nice feature! Over the year, I've been involved in several project leveraging ES where ES became the only source of data. In all these projects ES is the database. There is no external SQL db at all and every single piece of data is stored in an index containing different types. Which brings me to the next point
  • In term of operations, in order to backup all the data spread across several types I simply need to snapshot ONE index. When something goes wrong I simply restore ONE snapshot and I'm back in business. To me this single feature is very important because data stored in ES are related. It's not a bunch of indexes but rather a group of related things. And Types buys me this.

  • We sometimes use Alias to perform operations on several types. For instance it could be upgrading all our data and switching to the upgraded data by simply updating the alias. I know aliases can point to several indices already but once again when I use ES, the notions of types allows me to thing about all this inter-related data as one (ahem...) database.

Cons:

  • Several issues mainly (only?) related to mappings, inconsistencies and such.

Don't get me wrong, I do agree that types can be confusing until you understand their shortcomings. Once you are aware of them, it almost becomes a no-brainer.

To wrap up, we all agree that types have issues. One way of getting rid of these issues is to ged rid of types. Would it be possible to 'fix' these issues and keep types? For instance, would it be possible to prefix all field with the types they belong to at the lucene level. To me it would solve the mapping issues. WDYT ?

Sorry for the long post...
All the best,

Stéphane

@stephane-bastian

Types allows me to use an index as some sort of database, each 'table' being a different type. It might be seen as some sort of edge case but a lot of people as using it this way and I would like to say that this is a really nice feature!

This is a common misconception. Types are not tables. The underlying data structures are the same for the entire index, not per type. This means using multiple types causes gaps (ie sparsity) in the data structure each time consecutive docs have different fields. Lucene is getter better at handling sparse data, but that will never fully solve this problem.

In term of operations, in order to backup all the data spread across several types I simply need to snapshot ONE index. When something goes wrong I simply restore ONE snapshot and I'm back in business.

You can snapshot multiple indices in one snapshot.

We sometimes use Alias to perform operations on several types. For instance it could be upgrading all our data and switching to the upgraded data by simply updating the alias. I know aliases can point to several indices already but once again when I use ES, the notions of types allows me to thing about all this inter-related data as one (ahem...) database.

As you say, you aliases can be used with multiple indices. There is nothing inherent about types that make this better, it is only an additional filter added to each query (ie a term filter on the type essentially).

Cons:
Several issues mainly (only?) related to mappings, inconsistencies and such.

This is a drastic oversimplification. Types have caused issues for users for a very long time. We have made improvements to fix inconsistencies (eg #8870), and have discussed what types really give users since then. The remaining issues with types are well laid out by @jpountz in the original issue description here, as well as the linked blog post.

For instance, would it be possible to prefix all field with the types they belong to at the lucene level.

We discussed this idea long ago (around the same time as #8870). It does not solve the code complexities of types, and removes one of the few benefits of types (shared lexicon for similar fields).

The key to all of this is that types are not actually special. Essentially they are just an additional field, with an element from the URL path translated into an additional query filter. This is simple for a user to do themselves if they really want to, without causing new users to fall into the trap of "types are like tables".

Historically, types were useful for separating documents with different structure within indexes. However, in recent versions this practically no longer holds - different document types cannot share the same field name if their mappings don't match.

In addition the only manageable units in Elasticsearch are documents and indexes. You can add and delete them. You cannot manage types. As such, the usual recommendation I give is to maintain an index per type - since usually when there are several types involved their sizes and SLAs are different, and thus different configurations for replication, retention etc are required anyway. If not now then in the future.

With Elasticsearch slowly moving to index-level and even cluster-level sharding (with cross-cluster search), and with index sizes only growing in real-world systems - I think the chances of types actually being useful anywhere are now close to zero, and different types of documents are better separated on an index level.

At this point, IMO types are just an unnecessary noise for the majority of systems. We can certainly live without them, type less (pun intended) and manage different types of data using separate indexes. This will also avoid various not-uncommon pitfalls (e.g. large number of fields in mapping).

Only feature affected will be parent/child, and I'm sure there could be an easy solution there (I'd probably go with an "invisible" types for parents and children).

So I'm all in for this, hopefully will make it to v6!

Many years ago, I created an issue asking for inclusion of custom index-level metadata. The work-around given was to use a custom type within the same index with a single document: https://github.com/elastic/elasticsearch/issues/1649

I always found multiple types in the same index to be kludgy (as highlighted by this issue), so I never implemented the suggested solution, but used aliases (had the added benefit if being visible in UIs such as head). It would be great to have custom index-level metadata, especially since the suggested workaround will no longer be supported. Please excuse me if such a feature has been created since the original issue.

It would be great to have custom index-level metadata,

@brusic I presume the mapping meta field will cover this, as the mappings will now be an index level thing. Is this correct?

@bleskes If the mapping meta field will be promoted to an index level setting, it would be perfect.

If the mapping meta field will be promoted to an index level setting, it would be perfect

@brusic Since _meta is part of the mappings, and not part of a document, it is already per index.

Should really mention this in the docs or somewhere else. I had to google for hours to figure out this index-vs-type dilemma and only accidentally stumbling on this ticket has finally made it all clear.

Add support for setting enabled:false to the _type field, which will exclude type from the UID and use a fixed type name for (eg) mapping requests/responses.

what's that fixed name? Or rather what's the recommendation for a default type name for 5.4 and older if you already want to act as if types don't exist? default or _default_ maybe? or empy string?

Hi,
at the moment we are using indices to seperate tenants.
So we have one index per tenant and multiple types in one index.

We have three issues while moving to one type per index:

Consistency of backup/snapshot:
As I understand doing a snapshot of an index makes a consistent copy of that index.
Doing a snapshot of multiple index means that these indices are at different transaction state.
So restoring the set of indices leads to data mismatch (index backuped first maybe does not have data related by another type)

  • how to make a consistent snapshot across multiple indices
  • how to reduce snapshot complexity; need to snapshot 30 indices instead of one at the moment.

Shard density / number of indices and shards:
We have approx. 30 types per index. Having 300 tenants running on one cluster means at least 300x30=9000 indices on the cluster when not using types.
At the moment we have approx. 900 shards; in the future we could not go below 9,000 shards!
This would result in 10-15 times more shards running on the same cluster.

For sure it would be possible to merge all tenants into one index, but this would lead us to other big disadvantages:

  • unable to run different mapping versions per tenant
  • unable to snapshot and restore a single tenant granulary
  • unable to run different backup intervals / strategy / retention times per tenant
  • unable to move a tenant to another cluster by simply moving the index

@lefce you can continue to store all documents from a single tenant in a single index. The only difference is that there will be a single mapping, so you will need add your own type field instead of using the _type metadata field.

All mappings that have fields of the same name already share those fields with other mappings. So all we're really doing is removing the fiction that there are multiple independent "tables" in an index.

@clintongormley this sounds like a good solution. So we "only" have to merge the mappings and use our own type field. Thank you very much for the quick reply!

Just had a meeting with @jpountz and @s1monw about the typeless URLs migration, which is proving to be complicated because of the need to support 5.x multi-type indices at the same time as 6.x single-type indices.

New migration plan for typeless URLs:

In 6.0

(can contain 5.x indices, with multiple types)

  • type is still a required parameter in existing URLs - the URLs don't change at all
  • 6.x indices can use whatever type name they like, but there can only be one type in an index

In 7.0

(all indices have a single type)

  • type becomes an optional parameter in URLs, and is completely ignored.
  • for GET|PUT _mapping we add a query string parameter (eg include_type_name) which indicates whether the body should include a layer for the type name. this defaults to true. 7.x indices which don't have an explicit type will use the dummy type name _doc

In 8.0

(all indices have a single type and the index doesn't include type info)

  • type can no longer be specified in URLs
  • the include_type_name parameter now defaults to false

In 9.0

  • remove the include_type_name parameter

Updated the roadmap in https://github.com/elastic/elasticsearch/issues/15613#issuecomment-239435920 to reflect the above

We have a large user base trained to use "_type" meta field to search/navigate our multi-type indices. Will we be able to index documents containing "_type" field as part of their body attributes? This is to make shift to type-less indices as smooth as possible.

@haizaar you can have whatever field you want and treat as a category field (which what the current _type is, if you don't abuse it). Note though, that going forward we're likely to not allow user fields which start with _ to avoid collision with the internal meta fields - so it might not be a good idea to keep the _type name.

Good day, everyone.

Huge thread, and maybe I missed something while reading this.

Right now I'm using has_parent query where I looking for child type documents with filter criteria by parent type fields. And with next ES releases I need to move that two types to separate indices.
That means I cannot use has_parent query in the future and I cannot change easily data structure to rewrite query.
Correct me if I wrong, in that case I need to use custom type inside of index or use nested objects?

@amelnikoff
Parent-Child is moving to a new _type_-less model, using a _join field.
See:

@tvernum
As I understand correctly it will be relation between documents of the same type.
But I have two types.

Just realized that I need to combine two document types to single and I need to separate their properties inside of this type, then I can use join field.

I have a quick question about being prepared for moving from 5.x to 6.x, with what the suggestion is for using the bulk rest api. Currently the doc's suggest the _type is still necessary along with the index name when uploading. Should I now leave it blank? or possibly use the index name to be safe? I was up until now using the _type as a form of classification but I can easily move that elsewhere.

Thanks

@wtarr In 6.0, indices still have types, but you can only have a single type per index. We'll only officially remove the type name in 7.0.

So you could just use type doc or whatever.

I was wondering today when I saw this PR #27869 if we should also think of removing the PUT mapping API?
I mean that at the end of the day we will have only one "type" per index so doing something like:

PUT index
{
  "settings": {}
}
PUT index/_mapping
{
  "doc": { }
}

Could be replace anyway by something like:

PUT index
{
  "settings": {},
  "mappings": {
    "doc": { }
  }
}

@jpountz WDYT?

Since there is only ONE mapping per index, shouldn't the element "mappings" be renamed to "mapping" at some point?

Since there is only ONE mapping per index, shouldn't the element "mappings" be renamed to "mapping" at some point?

The plural part has to do with multiple fields, not multiple types.

I was wondering today when I saw this PR #27869 if we should also think of removing the PUT mapping API?

+1

Fwiw, an Update Indices Settings API also exists, even though each index has only ever had one set of settings: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-update-settings.html

@dadoonet I think the Update Mapping API would still be useful even with only one type since you can use it to add new fields to the mappings after index creation.

I think the Update Mapping API would still be useful even with only one type since you can use it to add new fields to the mappings after index creation.

@colings86 Ha! Very true. I forgot that o_O.
Wondering if this can be supported by the update settings though?

You mean updating mappings via the update settings API? This sounds inconsistent to me.

@jpountz It is also inconsistent that analysis is updated via _settings right? I think the gist of the proposal is we should have one API for updatable "stuff" for an index, which takes the same form as index creation (ie sections for settings, mappings, etc). The only reason we have POST /indexname/_settings is because POST /indexname is for indexing a document? So we need some suffix to mean "all the stuff that can be configured for an index. Maybe POST /indexname/_configuration could be a new handler, taking the entire set of configuration as described above. I think this would increase consistency (between index creation and update).

@rjernst I would agree with this. My comment was more about having the endpoint called _settings when it is actually about updating mappings.

The only reason we have POST /indexname/_settings is because POST /indexname is for indexing a document?

Actually we will be using POST {index}/_doc to index documents with auto-generated ids so I guess we could use POST {index} to update settings/mappings. I would like it to be handled in i separate issue however, since this one already has a very large scope.

From the discussion, I understand the issue with multi-types in an index. I agree it should be addressed, but not at the price of changing concepts and APIs. I think most people have used to the analogy of (db-index, table-type, record-doc) and found ES easy to get started.

I think this can really be something that Elastic team handles internally and probably easily. Let's take the twitter index as an example. If there are two types: tweet and user, ElasticSearch can simply create two new single type indexes behind scene as twitter_tweet, and twitter_user, and it will solve the technical issues with indexing.

For the APIs, it can and should still be the same as before. If a user searches:

/twitter/?search?q=*

Elasticsearch can translate it to the following, so that it can search all the indexes.
/twitter_*/?search?q=*

If a user searches tweet "type",

/twitter/tweet/?search?q=*

ES converts it to:

/twitter_tweet/?search?q=*

For multiple types

/twitter/tweet,user/?search?q=*

Convert it to
/twitter_tweet,twitter_user/?search?q=*

For the mapping, it may be more complicated for aggregation. There will be a bit work in translating the APIs, but it should be small - maybe I have missed something here.

Would this way be better to solve the real problem while maintaining the back compatibility?

The current design of types has the drawback of looking like per-index multi-tenancy, which we do not support well. Your suggestion would trade this problem for another one: too many Lucene indices in a cluster. Having more indices implies managing more segments, yet segments have overhead and having too many of them causes problems. We see such problems with users who have thousands (or more) of shards per index today already.

Plus the change you suggest would also have backward-compatibility implications on things like scoring given that index statistics would be different, unless the user opts in for DFS_QUERY_THEN_FETCH.

Finally one thing that I don't like with your proposal is that if you think about someone who will start using Elasticsearch in maybe two years without knowledge of the history of the project, seeing that putting documents in multiple indices or types have the same effect, there will be confusion about why we both have indices and types.

There will be a bit work in translating the APIs, but it should be small - maybe I have missed something here.

It might look small, but it leaves complication in how the engine works that isn't needed for most people.

If you need to keep using it, I'd suggest doing that complication in wrapping API layer, as a separate library in your preferred language.

I think this can really be something that Elastic team handles internally and probably easily.

You may not be aware, but that's a really toxic behaviour: http://dilbert.com/strip/1994-10-17

'easy'.

Thanks for the feedbacks. For the implementation, you guys working on it have done awesome jobs and definitely have better say on whether it's easy or not.

For the issue "too many Lucene indices in a cluster," I don't think it's a new issue with the proposed API compatibility. If ES will remove type in index, we will most likely have more indices anyway. I found the mixing the types in one index by users as suggested in one of the articles is not quite realistic unless it's a very small schema.

Really, too many indices is an implementation issue and I don't think the implementation change should break the public APIs.

Urgh this is so annoying, just because some people used to make it a relational database. It was so nice and easy to understand having multiple types under 1 index. Now what, I have to have multiple indexes?!

@antonydandrea You can make a new field yourself and call it "type", or something that might fit even better, and filter by that. Or, if you store very distinct things, it makes sense to use multiple indices anyway. There are very good reasons for the removal of types the way they exist right now, as you can read in this thread

I am sorry, but there are also very good reasons NOT to get rid of it too. It was so easy having different documents with different mappings under a single manageable index.

@antonydandrea What do you find easier with one index with multiple types vs. either multiple indices or managing types on top of Elasticsearch as described by @Felk ?

cc @elastic/es-search-aggs

Updated the plan to deprecate usage of include_type_name=true in 7.x (which is the default) and fail in 8.x, as discussed with @rjernst and @clintongormley.

We just discussed how we want to update REST tests with this change. The issue is that index creation, index, update, put mapping and some other APIs are going to complain with version 7.x if the include_type_name parameter is not set. Yet we don't want to skip all tests that use these APIs if the version if < 7.0 as this would disable most tests when testing clusters that run multiple versions. As a consequence, we agreed on the following plan:

  • Use _doc as a type name all the time.
  • Modify the test runner to ignore warnings about include_type_name not being set. This will allow multi-version (6.x and 7.0) cluster tests to keep running like today. We did something similar when moving from 5 shards to 1 shard by default (#30539).
  • Duplicate REST tests for APIs that now support the include_type_name parameter in order to test the behavior both when include_type_name is set and when it is not set. This should only be done on tests that are dedicated to testing a specific API. For instance, if index creation is used as a way to setup a tests, it should not be duplicated while if index creation itself is tested, it should be duplicated.

I updated the plan to add a new item to the 7.0 tasks list: "remove references to types from the high-level rest client API".

After seeing #33953 @jtibshirani raised the question of whether we want to do something so that users don't have to pass include_type_name=false to almost every request on 7.x.

It's true that for someone who would quickly resolve deprecation warnings and use the new typeless APIs, having to keep passing include_type_name=false to almost every request looks a bit ugly. Worse, new users would have to pass include_type_name=false as well only to stop doing it when they upgrade to 8.0. This isn't something that can't be coped with but maybe we could think about how we can make the experience better?

I don't think we should default include_type_name to true, this is going to be too surprising and will cause issues with mixed-version clusters. I also think it is important to give users a whole major version to upgrade: this is a major breaking change to our most used APIs (index creation, indexing of documents, search to name a few).

I have been considering adding a node setting, eg. rest.action.include_type_name, to change the default of include_type_name for all APIs, which would essentially make APIs behave like in 8.0. 7.x docs would still pass include_type_name=true to all affected APIs since we shouldn't expect users to have opted in for that behavior, but we could add notes that they could skip setting this on almost every request thanks to this new node setting.

Opinions / other suggestions? /cc @clintongormley @rjernst

I have been considering adding a node setting, eg. rest.action.include_type_name, to change the default of include_type_name for all APIs, which would essentially make APIs behave like in 8.0.

I like this idea

@jpountz to clarify, when you refer to include_type_name: true in the last two paragraphs, do you mean false?

Would it be possible to default the setting to false for new clusters, so that new users in 7.0 never come across the extra parameter? For customers upgrading from a previous version, the setting would instead be true by default. I’m just trying to brainstorm a way to keep the out-of-the-box experience nice for new users, and don’t have a perfect understanding of the cluster upgrade process/ details of running mixed version clusters.

An update: we met offline to talk through a revised plan for 7.0, which is now documented in #35190. The core of the plan is set, but there are still some open questions to sort out.

@jpountz Can we close this issue now that types have been deprecated?

We're tracking the remaining work for the types removal in another meta issue now so I am closing this one. Long live the types...

@jpountz @jimczi I am confused on this issue, would you please help to explain it for me?

If i have one index with 3 types, like:

  • type1: f1: integer, f2: integer, f3: text
  • type2: f4: integer, f5: text

then es(before version 5.x) will has 5 fields for every document in the index, which will lead to a sparse index. i'm thinking about a solution by make a transformation when create mapping like:

  • type1: integer_field_1, integer_field2, text_field_1
  • type2: integer_field_1, text_field_1

so the index will only has 3 fields:

  • integer_field_1: type1.f1, type2.f4
  • interger_field_2: type1.f2
  • text_field_1: type1.f3, type2.f5

when query the index, just do the same transformation to convert the field in query to real field stored in index. Can this solve the problem of sparse index? So that we can keep multiple type to facilitate data modeling.

@penfree This would work, but wouldn't it be even easier to have 2 indices?

@penfree This would work, but wouldn't it be even easier to have 2 indices?

multiple doctype make it much more easier to design data model.

An example:
our data is case of patient, it contains multiple doc type, like

  • patient info

    • clinic visit info



      • diagnosis


      • lab report


      • exam report



we should make a query on diagnosis or lab report to search clinic visit or patient, it is difficult to do it in multiple indices. Not to mention that we create an index per day and place them in an alias。

Was this page helpful?
0 / 5 - 0 ratings

Related issues

javanna picture javanna  Â·  72Comments

bittusarkar picture bittusarkar  Â·  43Comments

JagathJayasinghe picture JagathJayasinghe  Â·  105Comments

monken picture monken  Â·  160Comments

rbraley picture rbraley  Â·  67Comments