Tested on master, 5.3, 5.2, 5.0, and 2.3:
PUT /_bulk
{ "index": { "_index": "test", "_type": "type", "_id": "" } }
{ "doc": "1" }
{ "index": { "_index": "test", "_type": "type", "_id": "" } }
{ "doc": "2" }
{ "index": { "_index": "test", "_type": "type" } }
{ "doc": "3" }
This creates two documents:
{
"took": 439,
"errors": false,
"items": [
{
"index": {
"_index": "test",
"_type": "type",
"_id": "",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"created": true,
"status": 201
}
},
{
"index": {
"_index": "test",
"_type": "type",
"_id": "",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 1,
"created": false,
"status": 200
}
},
{
"index": {
"_index": "test",
"_type": "type",
"_id": "AVttiPbjiL4QtkRlx6WB",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"created": true,
"status": 201
}
}
]
}
This should create 3 documents, all with flake IDs (like the third document here). It's a bug because you can't actually GET a document with an empty _id.
GET /test/type/
spits out:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "No endpoint or operation is available at [type]"
}
],
"type": "illegal_argument_exception",
"reason": "No endpoint or operation is available at [type]"
},
"status": 400
}
It's always been like this (at least back to the 1.x series), it's not just a since 5.x thing. I do agree it's problematic though, for example you can not get such a document via the get API.
This should create 3 documents, all with flake IDs (like the third document here).
Maybe, or perhaps it should simply be rejected as an invalid request?
@jasontedor
Yeah, I've been slowly testing more and more builds.
Maybe, or perhaps it should simply be rejected as an invalid request?
That works for me too, specifically because individual indexing requests with a blank _id are rejected:
PUT /test/type/
{ "doc": 1 }
The reason I think that it should be rejected is because allowing _id: "" is a form of leniency. An application bug could easily unintentionally introduce _id: "" on a request and if I were an application developer accidentally introducing such a bug I would want Elasticsearch to reject it. We already have a way to specify auto-generated IDs: do not send an explicit ID field _id on the request, I don't think that we need a second way.
Agree that we should reject empty ID's. By the way, the issue is only with empty ID's, because if it's a space, you can still get the document by ID:
PUT /_bulk
{"index":{"_index":"test","_type":"type","_id":" "}}
{"doc":"1"}
GET test/type/%20
{
"_index": "test",
"_type": "type",
"_id": " ",
"_version": 1,
"found": true,
"_source": {
"doc": "1"
}
}
TBH, not sure if it should be rejected, or treated like it needs to auto-generate the ID. That's another option, right?
Yes, I do not think that we should reject a space as the _id as those can still be obtained exactly as you say.
TBH, not sure if it should be rejected, or treated like it needs to auto-generate the ID. That's another option, right?
I do not think so, that would be a breaking change.
The issue of _id being empty however is already broken, there's nothing to break but rather to fix.
I opened #24118.
Most helpful comment
The reason I think that it should be rejected is because allowing
_id: ""is a form of leniency. An application bug could easily unintentionally introduce_id: ""on a request and if I were an application developer accidentally introducing such a bug I would want Elasticsearch to reject it. We already have a way to specify auto-generated IDs: do not send an explicit ID field_idon the request, I don't think that we need a second way.