This Bonsuche mit folgender Anfrage: Belegart->\[%{WORD:param2},(?<param5>(\s*%{NOTSPACE})*)\] Zustand->ABGESCHLOSSEN Kassennummer->%{WORD:param9} Bonnummer->%{WORD:param10} Datum->%{DATESTAMP_OTHER:param11} not so complicated Grok pattern matched against Bonsuche mit folgender Anfrage: Belegart->[EINGESCHRAENKTER_VERKAUF, VERKAUF, NACHERFASSUNG] Zustand->ABGESCHLOSSEN Kassennummer->2 Bonnummer->6362 Datum->Mon Jan 08 00:00:00 UTC 2018 which I entered into the GROK Debugger in Kibana, crashed my whole cluster, by putting the CPU load on my proxy nodes to 100%. I had to restart them! Any idea whats going on here?
I was able to repeat this and every time the CPU usage would remain near 90% for hours on the proxy nodes till I restart them.
Elasticsearch-Version: 6.1.3 (Now updated to 6.2.1 problem still persists)
We were able to reproduce this on different clusters.
@gellweiler Thanks for reporting this, threads indeed can get stuck evaluating the grok expression.
This reproduce for me with the following ingest simulate api call on master:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"Bonsuche mit folgender Anfrage: Belegart->\\[%{WORD:param2},(?<param5>(\\s*%{NOTSPACE})*)\\] Zustand->ABGESCHLOSSEN Kassennummer->%{WORD:param9} Bonnummer->%{WORD:param10} Datum->%{DATESTAMP_OTHER:param11}"
]
}
}
]
},
"docs": [
{
"_source": {
"message": "Bonsuche mit folgender Anfrage: Belegart->[EINGESCHRAENKTER_VERKAUF, VERKAUF, NACHERFASSUNG] Zustand->ABGESCHLOSSEN Kassennummer->2 Bonnummer->6362 Datum->Mon Jan 08 00:00:00 UTC 2018"
}
}
]
}
This api call never returns and I see in hot threads api that the following stack trace being returned:
org.joni.ByteCodeMachine.matchAt(ByteCodeMachine.java:271)
org.joni.Matcher.matchCheck(Matcher.java:304)
org.joni.Matcher.searchInterruptible(Matcher.java:457)
org.joni.Matcher.search(Matcher.java:318)
org.elasticsearch.grok.Grok.captures(Grok.java:173)
org.elasticsearch.ingest.common.GrokProcessor.execute(GrokProcessor.java:65)
org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:100)
org.elasticsearch.ingest.Pipeline.execute(Pipeline.java:58)
org.elasticsearch.action.ingest.SimulateExecutionService.executeDocument(SimulateExecutionService.java:56)
org.elasticsearch.action.ingest.SimulateExecutionService$1.doRun(SimulateExecutionService.java:70)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
and this one:
org.joni.Matcher.matchCheck(Matcher.java:304)
org.joni.Matcher.searchInterruptible(Matcher.java:457)
org.joni.Matcher.search(Matcher.java:318)
org.elasticsearch.grok.Grok.captures(Grok.java:173)
org.elasticsearch.ingest.common.GrokProcessor.execute(GrokProcessor.java:65)
org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:100)
org.elasticsearch.ingest.Pipeline.execute(Pipeline.java:58)
org.elasticsearch.action.ingest.SimulateExecutionService.executeDocument(SimulateExecutionService.java:56)
org.elasticsearch.action.ingest.SimulateExecutionService$1.doRun(SimulateExecutionService.java:70)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
@talevy What would be the best way avoiding joni from evaluating these poisonous expressions?
So, I am still investigating what is causing the regex processor to go insane here, but I recommend trying this pattern for now:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"Bonsuche mit folgender Anfrage: Belegart->\\[%{WORD:param2}(?<param5>(,\\s*%{WORD})*)\\] Zustand->ABGESCHLOSSEN Kassennummer->%{WORD:param9} Bonnummer->%{WORD:param10} Datum->%{DATESTAMP_OTHER:param11}"
]
}
}
]
},
"docs": [
{
"_source": {
"message": "Bonsuche mit folgender Anfrage: Belegart->[EINGESCHRAENKTER_VERKAUF, VERKAUF, NACHERFASSUNG] Zustand->ABGESCHLOSSEN Kassennummer->2 Bonnummer->6362 Datum->Mon Jan 08 00:00:00 UTC 2018"
}
}
]
}
which results in the same expected output (I believe) and executes much faster
{
"_index": "_index",
"_type": "_type",
"_id": "_id",
"_source": {
"param5": ", VERKAUF, NACHERFASSUNG",
"param9": "2",
"param10": "6362",
"param11": "Mon Jan 08 00:00:00 UTC 2018",
"message": "Bonsuche mit folgender Anfrage: Belegart->[EINGESCHRAENKTER_VERKAUF, VERKAUF, NACHERFASSUNG] Zustand->ABGESCHLOSSEN Kassennummer->2 Bonnummer->6362 Datum->Mon Jan 08 00:00:00 UTC 2018",
"param2": "EINGESCHRAENKTER_VERKAUF"
}
Since I do not believe this is a generic way to avoid the regex engine from entering a potentially exponential execution time, I am investigating ways to provide safe ways to interrupt the regex to avoid it from taking too much CPU for too long
Thanks for investigating on this issue. The grok pattern was auto generated by a tool I wrote. I've changed the grok output to produce GREEDYDATA instead of Token*. Which also seems to be safe. Still I would like to know the root cause of this issue, to prevent generating dangerous expressions in the future.
Any update on this? I seem to have the same issue with a filebeat module.
My CPU went to 100%, hot_threads show org.joni.Matcher using all CPU. I will try to find out which grok pattern caused this, but it would be nice to have more information about which patterns should not be used within ingest pipelines
We have been discussing how to best fix this bug with what joni currently offers. Joni has checks in its code base that for every 30000 loops it checks if a threads interrupted flag has been set and in that case terminate the loop. I initially thought we could not use that as that would require to fork a thread for each thread that uses ingest with a grok processor, which is not feasible in ES. However it was brought to my attention that if we add a single background thread that checks if threads that use the grok processor have been running for too long and then set the interrupt flag. Each time the grok processor is used, threads would need to register themselves. This approach sounds good to me and I will try to implement this.
Most helpful comment
We have been discussing how to best fix this bug with what joni currently offers. Joni has checks in its code base that for every 30000 loops it checks if a threads interrupted flag has been set and in that case terminate the loop. I initially thought we could not use that as that would require to fork a thread for each thread that uses ingest with a grok processor, which is not feasible in ES. However it was brought to my attention that if we add a single background thread that checks if threads that use the grok processor have been running for too long and then set the interrupt flag. Each time the grok processor is used, threads would need to register themselves. This approach sounds good to me and I will try to implement this.