When a snapshot fails, the snapshot/_status will return a 500 error. It seems the only way to fetch the actual "FAILED" status is by listing the repository/_all. To me, the 500 exception returned when calling the snapshot/_status seems wrong.
Elasticsearch version: 5.2.2
Plugins installed: x-pack
PUT /_snapshot/my_backup
{
"type": "fs",
"settings": {
"compress": true,
"location": "repo_test"
}
}
PUT test1
PUT /_snapshot/my_backup/snapshot_1
{
"indices": "test1",
"ignore_unavailable": true,
"include_global_state": false
}
GET _snapshot/my_backup/snapshot_1/_status
` response `
{
"snapshots": [
{
"snapshot": "snapshot_1",
"repository": "my_backup",
"uuid": "8KxZ0zSlQFyh77dqvxc3Mw",
"state": "SUCCESS",
}]}
`make a "bad" index... `
PUT /_cluster/settings
{
"transient": {
"cluster.routing.allocation.enable": "none"
}
}
PUT test2
PUT /_snapshot/my_backup/snapshot_2
{
"indices": "test1,test2",
"ignore_unavailable": true,
"include_global_state": false
}
GET _snapshot/my_backup/snapshot_2/_status
` response `
{
"error": {
"root_cause": [
{
"type": "index_shard_restore_failed_exception",
"reason": "failed to read shard snapshot file",
"index_uuid": "_f7dq3AMSEejQMZF4sbqYA",
"shard": "0",
"index": "test1"
}
],
"type": "index_shard_restore_failed_exception",
"reason": "failed to read shard snapshot file",
"index_uuid": "_f7dq3AMSEejQMZF4sbqYA",
"shard": "0",
"index": "test1",
"caused_by": {
"type": "no_such_file_exception",
"reason": "/Users/jared/tmp/repo_test/indices/5H7x7fA-QsK7xqs6MdO0Bw/0/snap-2XWQ_Sd4QMCdSo1wU4VkoA.dat"
}
},
"status": 500
}
GET /_snapshot/my_backup/_all?filter_path=*.snapshot,*.state
` response `
{
"snapshots": [
{
"snapshot": "snapshot_1",
"state": "SUCCESS"
},
{
"snapshot": "snapshot_2",
"state": "FAILED"
}
]
}
This sounds like a legit request to me, @imotov what do you think?
I agree, the _status endpoint for a failed snapshot should return information about the failure in a standard response, not a 500.
thanks @abeyad ! I will mark adoptme then.
@abeyad that feels like a bug and not enhancement. What do you think?
@imotov agreed, i'll change the label
++ thanks for taking it @abeyad
@jpcarey the steps you outlined above does not reproduce for me on 5.2.2. Instead, for
curl -XGET "localhost:9200/_snapshot/fs_repo/snap1"
I get:
{
"snapshots" : [
{
"snapshot" : "snap1",
"uuid" : "iTxr6rgSQMqjGOEOtk1C3g",
"version_id" : 5020299,
"version" : "5.2.2",
"indices" : [
"idx2"
],
"state" : "FAILED",
"reason" : "Indices don't have primary shards [idx2]",
"start_time" : "2017-03-30T17:25:56.191Z",
"start_time_in_millis" : 1490894756191,
"end_time" : "2017-03-30T17:25:56.199Z",
"end_time_in_millis" : 1490894756199,
"duration_in_millis" : 8,
"failures" : [
{
"index" : "idx2",
"index_uuid" : "idx2",
"shard_id" : 3,
"reason" : "primary shard is not allocated",
"status" : "INTERNAL_SERVER_ERROR"
},
{
"index" : "idx2",
"index_uuid" : "idx2",
"shard_id" : 2,
"reason" : "primary shard is not allocated",
"status" : "INTERNAL_SERVER_ERROR"
},
{
"index" : "idx2",
"index_uuid" : "idx2",
"shard_id" : 4,
"reason" : "primary shard is not allocated",
"status" : "INTERNAL_SERVER_ERROR"
},
{
"index" : "idx2",
"index_uuid" : "idx2",
"shard_id" : 0,
"reason" : "primary shard is not allocated",
"status" : "INTERNAL_SERVER_ERROR"
},
{
"index" : "idx2",
"index_uuid" : "idx2",
"shard_id" : 1,
"reason" : "primary shard is not allocated",
"status" : "INTERNAL_SERVER_ERROR"
}
],
"shards" : {
"total" : 5,
"failed" : 5,
"successful" : 0
}
}
]
}
For getting the status:
curl -XGET "localhost:9200/_snapshot/fs_repo/snap1/_status"
I get:
{
"snapshots" : [
{
"snapshot" : "snap1",
"repository" : "fs_repo",
"uuid" : "iTxr6rgSQMqjGOEOtk1C3g",
"state" : "FAILED",
"shards_stats" : {
"initializing" : 0,
"started" : 0,
"finalizing" : 0,
"done" : 0,
"failed" : 5,
"total" : 5
},
"stats" : {
"number_of_files" : 0,
"processed_files" : 0,
"total_size_in_bytes" : 0,
"processed_size_in_bytes" : 0,
"start_time_in_millis" : 0,
"time_in_millis" : 0
},
"indices" : {
"idx2" : {
"shards_stats" : {
"initializing" : 0,
"started" : 0,
"finalizing" : 0,
"done" : 0,
"failed" : 5,
"total" : 5
},
"stats" : {
"number_of_files" : 0,
"processed_files" : 0,
"total_size_in_bytes" : 0,
"processed_size_in_bytes" : 0,
"start_time_in_millis" : 0,
"time_in_millis" : 0
},
"shards" : {
"0" : {
"stage" : "FAILURE",
"stats" : {
"number_of_files" : 0,
"processed_files" : 0,
"total_size_in_bytes" : 0,
"processed_size_in_bytes" : 0,
"start_time_in_millis" : 0,
"time_in_millis" : 0
},
"reason" : "primary shard is not allocated"
},
"1" : {
"stage" : "FAILURE",
"stats" : {
"number_of_files" : 0,
"processed_files" : 0,
"total_size_in_bytes" : 0,
"processed_size_in_bytes" : 0,
"start_time_in_millis" : 0,
"time_in_millis" : 0
},
"reason" : "primary shard is not allocated"
},
"2" : {
"stage" : "FAILURE",
"stats" : {
"number_of_files" : 0,
"processed_files" : 0,
"total_size_in_bytes" : 0,
"processed_size_in_bytes" : 0,
"start_time_in_millis" : 0,
"time_in_millis" : 0
},
"reason" : "primary shard is not allocated"
},
"3" : {
"stage" : "FAILURE",
"stats" : {
"number_of_files" : 0,
"processed_files" : 0,
"total_size_in_bytes" : 0,
"processed_size_in_bytes" : 0,
"start_time_in_millis" : 0,
"time_in_millis" : 0
},
"reason" : "primary shard is not allocated"
},
"4" : {
"stage" : "FAILURE",
"stats" : {
"number_of_files" : 0,
"processed_files" : 0,
"total_size_in_bytes" : 0,
"processed_size_in_bytes" : 0,
"start_time_in_millis" : 0,
"time_in_millis" : 0
},
"reason" : "primary shard is not allocated"
}
}
}
}
}
]
}
@abeyad I re-ran the steps I provided (without x-pack), and still get the error with 5.2.2 (fresh untar). Reading the error, it is complaining about index test1, which is odd. I went back and made sure to add documents to the index, incase it was an issue around a blank index - same results.
macOS Sierra 10.12.3 (16D32)
java version "1.8.0_65"
Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
curl 'localhost:9200/_snapshot/my_backup/snapshot_2/_status?pretty'
{
"error" : {
"root_cause" : [
{
"type" : "index_shard_restore_failed_exception",
"reason" : "failed to read shard snapshot file",
"index_uuid" : "RnhkQinqT4yYodBnq4fARQ",
"shard" : "0",
"index" : "test1"
}
],
"type" : "index_shard_restore_failed_exception",
"reason" : "failed to read shard snapshot file",
"index_uuid" : "RnhkQinqT4yYodBnq4fARQ",
"shard" : "0",
"index" : "test1",
"caused_by" : {
"type" : "no_such_file_exception",
"reason" : "/Users/jared/tmp/repo_test/indices/uRZ1_CzRQ-eL3LyKwSvHcA/0/snap-ndxheQU0QgixJnHsLBmXJg.dat"
}
},
"status" : 500
}
@jpcarey I reproduced the problem - the issue is if you specify the snapshot to have only "bad" indices, then getting its status works fine. If the snapshot contains a mix of good and bad indices, then I get the same error you got.