Elasticsearch: failed snapshot _status returns 500

Created on 23 Mar 2017 · 9Comments · Source: elastic/elasticsearch

When a snapshot fails, the snapshot/_status will return a 500 error. It seems the only way to fetch the actual "FAILED" status is by listing the repository/_all. To me, the 500 exception returned when calling the snapshot/_status seems wrong.

Elasticsearch version: 5.2.2

Plugins installed: x-pack

PUT /_snapshot/my_backup
{
  "type": "fs",
  "settings": {
    "compress": true,
    "location": "repo_test"
  }
}

PUT test1

PUT /_snapshot/my_backup/snapshot_1
{
  "indices": "test1",
  "ignore_unavailable": true,
  "include_global_state": false
}

GET _snapshot/my_backup/snapshot_1/_status
` response `
{
  "snapshots": [
    {
      "snapshot": "snapshot_1",
      "repository": "my_backup",
      "uuid": "8KxZ0zSlQFyh77dqvxc3Mw",
      "state": "SUCCESS",

}]}

`make a "bad" index...   `
PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.enable": "none"
  }
}

PUT test2

PUT /_snapshot/my_backup/snapshot_2
{
  "indices": "test1,test2",
  "ignore_unavailable": true,
  "include_global_state": false
}

GET _snapshot/my_backup/snapshot_2/_status
` response `
{
  "error": {
    "root_cause": [
      {
        "type": "index_shard_restore_failed_exception",
        "reason": "failed to read shard snapshot file",
        "index_uuid": "_f7dq3AMSEejQMZF4sbqYA",
        "shard": "0",
        "index": "test1"
      }
    ],
    "type": "index_shard_restore_failed_exception",
    "reason": "failed to read shard snapshot file",
    "index_uuid": "_f7dq3AMSEejQMZF4sbqYA",
    "shard": "0",
    "index": "test1",
    "caused_by": {
      "type": "no_such_file_exception",
      "reason": "/Users/jared/tmp/repo_test/indices/5H7x7fA-QsK7xqs6MdO0Bw/0/snap-2XWQ_Sd4QMCdSo1wU4VkoA.dat"
    }
  },
  "status": 500
}

GET /_snapshot/my_backup/_all?filter_path=*.snapshot,*.state
` response `
{
  "snapshots": [
    {
      "snapshot": "snapshot_1",
      "state": "SUCCESS"
    },
    {
      "snapshot": "snapshot_2",
      "state": "FAILED"
    }
  ]
}

:DistributeSnapshoRestore >bug

Source

jpcarey

All 9 comments

This sounds like a legit request to me, @imotov what do you think?

javanna on 24 Mar 2017

I agree, the _status endpoint for a failed snapshot should return information about the failure in a standard response, not a 500.

abeyad on 24 Mar 2017

👍1

thanks @abeyad ! I will mark adoptme then.

javanna on 24 Mar 2017

@abeyad that feels like a bug and not enhancement. What do you think?

imotov on 24 Mar 2017

@imotov agreed, i'll change the label

abeyad on 24 Mar 2017

++ thanks for taking it @abeyad

javanna on 24 Mar 2017

👍1

@jpcarey the steps you outlined above does not reproduce for me on 5.2.2. Instead, for

curl -XGET "localhost:9200/_snapshot/fs_repo/snap1"

I get:

{
  "snapshots" : [
    {
      "snapshot" : "snap1",
      "uuid" : "iTxr6rgSQMqjGOEOtk1C3g",
      "version_id" : 5020299,
      "version" : "5.2.2",
      "indices" : [
        "idx2"
      ],
      "state" : "FAILED",
      "reason" : "Indices don't have primary shards [idx2]",
      "start_time" : "2017-03-30T17:25:56.191Z",
      "start_time_in_millis" : 1490894756191,
      "end_time" : "2017-03-30T17:25:56.199Z",
      "end_time_in_millis" : 1490894756199,
      "duration_in_millis" : 8,
      "failures" : [
        {
          "index" : "idx2",
          "index_uuid" : "idx2",
          "shard_id" : 3,
          "reason" : "primary shard is not allocated",
          "status" : "INTERNAL_SERVER_ERROR"
        },
        {
          "index" : "idx2",
          "index_uuid" : "idx2",
          "shard_id" : 2,
          "reason" : "primary shard is not allocated",
          "status" : "INTERNAL_SERVER_ERROR"
        },
        {
          "index" : "idx2",
          "index_uuid" : "idx2",
          "shard_id" : 4,
          "reason" : "primary shard is not allocated",
          "status" : "INTERNAL_SERVER_ERROR"
        },
        {
          "index" : "idx2",
          "index_uuid" : "idx2",
          "shard_id" : 0,
          "reason" : "primary shard is not allocated",
          "status" : "INTERNAL_SERVER_ERROR"
        },
        {
          "index" : "idx2",
          "index_uuid" : "idx2",
          "shard_id" : 1,
          "reason" : "primary shard is not allocated",
          "status" : "INTERNAL_SERVER_ERROR"
        }
      ],
      "shards" : {
        "total" : 5,
        "failed" : 5,
        "successful" : 0
      }
    }
  ]
}

For getting the status:

curl -XGET "localhost:9200/_snapshot/fs_repo/snap1/_status"

I get:

{
  "snapshots" : [
    {
      "snapshot" : "snap1",
      "repository" : "fs_repo",
      "uuid" : "iTxr6rgSQMqjGOEOtk1C3g",
      "state" : "FAILED",
      "shards_stats" : {
        "initializing" : 0,
        "started" : 0,
        "finalizing" : 0,
        "done" : 0,
        "failed" : 5,
        "total" : 5
      },
      "stats" : {
        "number_of_files" : 0,
        "processed_files" : 0,
        "total_size_in_bytes" : 0,
        "processed_size_in_bytes" : 0,
        "start_time_in_millis" : 0,
        "time_in_millis" : 0
      },
      "indices" : {
        "idx2" : {
          "shards_stats" : {
            "initializing" : 0,
            "started" : 0,
            "finalizing" : 0,
            "done" : 0,
            "failed" : 5,
            "total" : 5
          },
          "stats" : {
            "number_of_files" : 0,
            "processed_files" : 0,
            "total_size_in_bytes" : 0,
            "processed_size_in_bytes" : 0,
            "start_time_in_millis" : 0,
            "time_in_millis" : 0
          },
          "shards" : {
            "0" : {
              "stage" : "FAILURE",
              "stats" : {
                "number_of_files" : 0,
                "processed_files" : 0,
                "total_size_in_bytes" : 0,
                "processed_size_in_bytes" : 0,
                "start_time_in_millis" : 0,
                "time_in_millis" : 0
              },
              "reason" : "primary shard is not allocated"
            },
            "1" : {
              "stage" : "FAILURE",
              "stats" : {
                "number_of_files" : 0,
                "processed_files" : 0,
                "total_size_in_bytes" : 0,
                "processed_size_in_bytes" : 0,
                "start_time_in_millis" : 0,
                "time_in_millis" : 0
              },
              "reason" : "primary shard is not allocated"
            },
            "2" : {
              "stage" : "FAILURE",
              "stats" : {
                "number_of_files" : 0,
                "processed_files" : 0,
                "total_size_in_bytes" : 0,
                "processed_size_in_bytes" : 0,
                "start_time_in_millis" : 0,
                "time_in_millis" : 0
              },
              "reason" : "primary shard is not allocated"
            },
            "3" : {
              "stage" : "FAILURE",
              "stats" : {
                "number_of_files" : 0,
                "processed_files" : 0,
                "total_size_in_bytes" : 0,
                "processed_size_in_bytes" : 0,
                "start_time_in_millis" : 0,
                "time_in_millis" : 0
              },
              "reason" : "primary shard is not allocated"
            },
            "4" : {
              "stage" : "FAILURE",
              "stats" : {
                "number_of_files" : 0,
                "processed_files" : 0,
                "total_size_in_bytes" : 0,
                "processed_size_in_bytes" : 0,
                "start_time_in_millis" : 0,
                "time_in_millis" : 0
              },
              "reason" : "primary shard is not allocated"
            }
          }
        }
      }
    }
  ]
}

abeyad on 30 Mar 2017

@abeyad I re-ran the steps I provided (without x-pack), and still get the error with 5.2.2 (fresh untar). Reading the error, it is complaining about index test1, which is odd. I went back and made sure to add documents to the index, incase it was an issue around a blank index - same results.

macOS Sierra 10.12.3 (16D32)
java version "1.8.0_65"
Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)

curl 'localhost:9200/_snapshot/my_backup/snapshot_2/_status?pretty'
{
  "error" : {
    "root_cause" : [
      {
        "type" : "index_shard_restore_failed_exception",
        "reason" : "failed to read shard snapshot file",
        "index_uuid" : "RnhkQinqT4yYodBnq4fARQ",
        "shard" : "0",
        "index" : "test1"
      }
    ],
    "type" : "index_shard_restore_failed_exception",
    "reason" : "failed to read shard snapshot file",
    "index_uuid" : "RnhkQinqT4yYodBnq4fARQ",
    "shard" : "0",
    "index" : "test1",
    "caused_by" : {
      "type" : "no_such_file_exception",
      "reason" : "/Users/jared/tmp/repo_test/indices/uRZ1_CzRQ-eL3LyKwSvHcA/0/snap-ndxheQU0QgixJnHsLBmXJg.dat"
    }
  },
  "status" : 500
}

jpcarey on 30 Mar 2017

@jpcarey I reproduced the problem - the issue is if you specify the snapshot to have only "bad" indices, then getting its status works fine. If the snapshot contains a mix of good and bad indices, then I get the same error you got.

abeyad on 30 Mar 2017

Was this page helpful?

0 / 5 - 0 ratings