Elasticsearch version (bin/elasticsearch --version): 7.7
Plugins installed: []
JVM version (java -version): -
OS version (uname -a if on a Unix-like system): -
Description of the problem including expected versus actual behavior:
When hitting the flood stage watermark we set all indices on that node to read_only_allow_delete as documented here.
Previously (?) when this happened, the messages returned to each request would usually say FORBIDDEN/12/index read-only / allow delete (api) but it seems that something recently changed and it now says TOO_MANY_REQUESTS/12/index read-only / allow delete (api).
I think the TOO_MANY_REQUESTS is misleading, as it suggests that this could be caused by throttling.
Also found several comments in discuss where users found this confusing. Here are two examples:
https://discuss.elastic.co/t/elastic-7-7-0-too-many-requests-with-one-request/233417/3
https://discuss.elastic.co/t/too-many-requests-12-index-read-only-allow-delete-api/236718/3
@DaveCTurner since you happened to reply to both of them I'm tagging you here.
I agree the message could be improved, substantially. I don鈥檛 think it鈥檚 any less clear with TOO_MANY_REQUESTS vs. FORBIDDEN. I want to clarify the reason for TOO_MANY_REQUESTS (vs. FORBIDDEN). Previously we returned a 403 status code in this situation, which translates to FORBIDDEN, and clients would not retry. That鈥檚 bad since disk full is transient: once an administrator cleans up, we resolve the situation by removing the block. So a client could have retried. This should be indicated by 429, so we made this change. And that associates to TOO_MANY_REQUESTS.
Ah, thanks for the background @jasontedor . Didn't think about this being reflected from the HTTP status code.
Fully agree that FORBIDDEN wasn't any better.
Would we be able to add something about disk space/watermarks to the message or is it parsed from HTTP code and index setting etc?
"reason": "index [test] blocked by: [TOO_MANY_REQUESTS/12/index read-only / allow delete (api)];"
Yes, currently it鈥檚 produced from the block placed on the index which only carries a status code and a message that indicates the type of block (note a user can manually apply the block, so its presence doesn鈥檛 mean disk full necessarily). It will be some effort to make the situation clearer.
a user can manually apply the block, so its presence doesn鈥檛 mean disk full necessarily
In practice I don't think users really do apply this block manually. If they did, it would today be automatically removed a short while later. We discussed this when contemplating the auto-release behaviour and decided that this block should be considered as under the control of the disk-based shard allocator; if users want a read-only index then they can apply other blocks, and if they want to delete such an index then they can manually remove the block first.
This was technically a breaking change but I haven't seen any real-world impact in the 9 months since 7.4.0 was released.
Given this, I think rewording the description of this block is the right thing to do.
Pinging @elastic/es-distributed (:Distributed/Allocation)
Nice, thanks for the quick fix @DaveCTurner !