Elasticsearch: HTTP _bulk allows "_id": "" to set empty _id

Created on 14 Apr 2017  路  7Comments  路  Source: elastic/elasticsearch

Tested on master, 5.3, 5.2, 5.0, and 2.3:

PUT /_bulk
{ "index": { "_index": "test", "_type": "type", "_id": "" } }
{ "doc": "1" }
{ "index": { "_index": "test", "_type": "type", "_id": "" } }
{ "doc": "2" }
{ "index": { "_index": "test", "_type": "type" } }
{ "doc": "3" }

This creates two documents:

{
  "took": 439,
  "errors": false,
  "items": [
    {
      "index": {
        "_index": "test",
        "_type": "type",
        "_id": "",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 0,
        "created": true,
        "status": 201
      }
    },
    {
      "index": {
        "_index": "test",
        "_type": "type",
        "_id": "",
        "_version": 2,
        "result": "updated",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 1,
        "created": false,
        "status": 200
      }
    },
    {
      "index": {
        "_index": "test",
        "_type": "type",
        "_id": "AVttiPbjiL4QtkRlx6WB",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 0,
        "created": true,
        "status": 201
      }
    }
  ]
}

This should create 3 documents, all with flake IDs (like the third document here). It's a bug because you can't actually GET a document with an empty _id.

GET /test/type/

spits out:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "No endpoint or operation is available at [type]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "No endpoint or operation is available at [type]"
  },
  "status": 400
}
:DistributeCRUD >bug v6.0.0-alpha1

Most helpful comment

The reason I think that it should be rejected is because allowing _id: "" is a form of leniency. An application bug could easily unintentionally introduce _id: "" on a request and if I were an application developer accidentally introducing such a bug I would want Elasticsearch to reject it. We already have a way to specify auto-generated IDs: do not send an explicit ID field _id on the request, I don't think that we need a second way.

All 7 comments

It's always been like this (at least back to the 1.x series), it's not just a since 5.x thing. I do agree it's problematic though, for example you can not get such a document via the get API.

This should create 3 documents, all with flake IDs (like the third document here).

Maybe, or perhaps it should simply be rejected as an invalid request?

@jasontedor

Yeah, I've been slowly testing more and more builds.

Maybe, or perhaps it should simply be rejected as an invalid request?

That works for me too, specifically because individual indexing requests with a blank _id are rejected:

PUT /test/type/
{ "doc": 1 }

The reason I think that it should be rejected is because allowing _id: "" is a form of leniency. An application bug could easily unintentionally introduce _id: "" on a request and if I were an application developer accidentally introducing such a bug I would want Elasticsearch to reject it. We already have a way to specify auto-generated IDs: do not send an explicit ID field _id on the request, I don't think that we need a second way.

Agree that we should reject empty ID's. By the way, the issue is only with empty ID's, because if it's a space, you can still get the document by ID:

PUT /_bulk
{"index":{"_index":"test","_type":"type","_id":" "}}
{"doc":"1"}

GET test/type/%20
{
  "_index": "test",
  "_type": "type",
  "_id": " ",
  "_version": 1,
  "found": true,
  "_source": {
    "doc": "1"
  }
}

TBH, not sure if it should be rejected, or treated like it needs to auto-generate the ID. That's another option, right?

Yes, I do not think that we should reject a space as the _id as those can still be obtained exactly as you say.

TBH, not sure if it should be rejected, or treated like it needs to auto-generate the ID. That's another option, right?

I do not think so, that would be a breaking change.

The issue of _id being empty however is already broken, there's nothing to break but rather to fix.

I opened #24118.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

clintongormley picture clintongormley  路  3Comments

abtpst picture abtpst  路  3Comments

jpountz picture jpountz  路  3Comments

ppf2 picture ppf2  路  3Comments

matthughes picture matthughes  路  3Comments