Elasticsearch: [ML] Model state docs are orphaned in .ml-state after job is deleted

Created on 12 May 2018  路  4Comments  路  Source: elastic/elasticsearch

First reported on the discuss forum: https://discuss.elastic.co/t/ml-state-is-too-big/131561

I happened to look at my own setup and did also notice a few model states that belonged to jobs that no longer existed in my system.

image

There is no current job in my system called test_kpi - although I'm sure at one time there was and it was deleted.

image

Not sure what version is being used on the user on Discuss, but I'm currently using v6.2.0

:ml >bug

Most helpful comment

We found the cause of this. It was a bug that was introduced in version 6.1. When we persist the model state, we persist the state documents in .ml-state index and a model_snapshot document in the results index. Later, in order to delete the state documents, we need to have the model snapshot doc. Due to the bug, during background periodic persistence, the state documents were persisted but the model snapshot document was put in a buffer. If the job was deleted from the UI before the buffer was flushed, the snapshot documents would never be indexed, meaning the state docs would be left behind after the job was deleted.

The above bug is resolved in 6.3.0 (and 6.2.5 if that version is ever released). However, in order to ensure those documents are deleted and to prevent such cases in the future, I will work on enhancing the daily maintenance service to look for left-behind state docs and clean them up.

All 4 comments

Pinging @elastic/ml-core

@richcollier Can you see any model_snapshot documents in the results index with job_id set to test_kpi?

@dimitris-athanasiou - there are no documents in .ml-anomalies for job_id:test_kpi

We found the cause of this. It was a bug that was introduced in version 6.1. When we persist the model state, we persist the state documents in .ml-state index and a model_snapshot document in the results index. Later, in order to delete the state documents, we need to have the model snapshot doc. Due to the bug, during background periodic persistence, the state documents were persisted but the model snapshot document was put in a buffer. If the job was deleted from the UI before the buffer was flushed, the snapshot documents would never be indexed, meaning the state docs would be left behind after the job was deleted.

The above bug is resolved in 6.3.0 (and 6.2.5 if that version is ever released). However, in order to ensure those documents are deleted and to prevent such cases in the future, I will work on enhancing the daily maintenance service to look for left-behind state docs and clean them up.

Was this page helpful?
0 / 5 - 0 ratings