Alertmanager: Silence UI is unresponsive with 1000s of silences

Created on 8 Sep 2017  路  18Comments  路  Source: prometheus/alertmanager

For some reason, we ended up with 2000 silences, and that makes the silence UI quite unresponsive when typing in the filter input.

It's unclear to me what it is doing exactly as I don't think there is any kind of auto-completion here.

The chrome profiler seems to say that the time is spent in "Update layer tree".

I suspect that it's slow mostly because all the silences are rendered in the page. Paginating the silences (and the alerts) would mostly solve that.

componenui kinenhancement

Most helpful comment

Interesting enough, I discussed the same issue with @w0rm recently. The same will be true if you find yourself in a situation with 1000s of alerts. In that situation, the UI is so unresponsive that it's very hard to enter a filter string to filter down the number of items displayed.

We need some kind of paging or, in the simplest case, a message like "3456 alerts in total, displayed alerts are incomplete, please enter a filter to reduce the number of returned results"

All 18 comments

Interesting enough, I discussed the same issue with @w0rm recently. The same will be true if you find yourself in a situation with 1000s of alerts. In that situation, the UI is so unresponsive that it's very hard to enter a filter string to filter down the number of items displayed.

We need some kind of paging or, in the simplest case, a message like "3456 alerts in total, displayed alerts are incomplete, please enter a filter to reduce the number of returned results"

Out of curiosity, can you post your OS, system specs, and browser used when you experienced this? Checking your system's resource usage would also be interesting, to see if it's RAM or CPU issues.

Intel(R) Xeon(R) CPU E3-1271 v3 @ 3.60GHz with 32GB of memory.
Using the latest build of Google Chrome on Linux.
When typing there is one core that 100% used.

@w0rm confirmed that typing into the filter bar with ~2000 silences definitely has a delay, anywhere between 500ms and 1500ms, when entering a character. The message is UpdateMatcherText.

Scrolling and clicking on filter labels is fast, it's just the chain of events kicked off by updating the matcher text bar.

I've been going through the rendering code trying to see what the problem is. It shouldn't be re-rendering any of the silences, but there's a lot of code that sorts the silences by status and state. I was hoping this code would no-op since the data going in hasn't changed.

The silences are in a stable order in memory, so I was trying to remove work in the rendering code (sorting silences by active/pending/expired, for example), and using Html.Lazy, but it's not improving.

Any ideas?

If you can't think of anything, the next step would be to try filtering on the backend. I'm guessing the filtering on the front end is what's maxing out the CPU.

@stuartnelson3 I think it's rendering that is slow. We need to add pagination.

screen shot 2017-09-10 at 15 57 10

And, because we're updating the filter UI, it forces a re-render on the entire page?

@stuartnelson3 yeah, each update of the model causes the rendering to be re-evaluated. lazy only compares the arguments with === so it won't work if they are computed.

Ok, so it isn't possible to just re-render the div containing the filter bar, the whole page needs to be re-rendered? I thought that was part of this whole shadow-dom thing, where only the changed portions would be re-rendered.

@stuartnelson3 you can put lazy anywhere in the render tree. If a part of the model stays the same, === check would return true, and the subtree wouldn't be re-rendered

That's why I'm confused by the time spent rendering. If I put lazy here: https://github.com/prometheus/alertmanager/blob/master/ui/app/src/Views/SilenceList/Views.elm#L27

...my understanding is updating the filter bar shouldn't cause this list of silences to be rerendered, because the input isn't changing. But, the app is still slow when updating the filter bar when a tab with 2000 silences is displayed.

If I'm on a different tab, updating the filter bar performs normally. This leads me to believe that the tab is being rerendered on each update, but it shouldn't be.

Am I misunderstanding something?

You should extract the following code into a renderSilences tab silences function:

case silences of
    Success sils ->
        silencesView (filterSilencesByState tab sils)

    Failure msg ->
        error msg

    _ ->
        loading

and then call it with Html.lazy2 renderSilences tab silences.
Because both tab and silences stay the same when the filter is changing, the lazy thunk won't be re-evaluated in the virtual dom algorithm. Does it make sense?

That does make sense, I didn't realize I had to handle the entire case statement with lazy2.

Unfortunately, the result is the same. I agree that we should have pagination (after we start sending silences from the API in a stable order), but I still think lazy rendering should be working here. Maybe you'll give a minute this week to see what I'm doing wrong.

diff --git a/ui/app/src/Views/SilenceList/Views.elm b/ui/app/src/Views/SilenceList/Views.elm
index 962e96a..e8a389d 100644
--- a/ui/app/src/Views/SilenceList/Views.elm
+++ b/ui/app/src/Views/SilenceList/Views.elm
@@ -2,6 +2,7 @@ module Views.SilenceList.Views exposing (..)

 import Html exposing (..)
 import Html.Attributes exposing (..)
+import Html.Lazy exposing (..)
 import Views.SilenceList.Types exposing (SilenceListMsg(..), Model)
 import Views.SilenceList.SilenceView
 import Silences.Types exposing (Silence, State(..), stateToString)
@@ -22,16 +23,21 @@ view { filterBar, tab, silences } =
             ]
         , ul [ class "nav nav-tabs mb-4" ]
             (List.map (tabView tab) (groupSilencesByState (withDefault [] silences)))
-        , case silences of
-            Success sils ->
-                silencesView (filterSilencesByState tab sils)
+        , lazy2 renderSilences tab silences
+        ]

-            Failure msg ->
-                error msg

-            _ ->
-                loading
-        ]
+renderSilences : State -> ApiData (List Silence) -> Html Msg
+renderSilences tab silences =
+    case silences of
+        Success sils ->
+            silencesView (filterSilencesByState tab sils)
+
+        Failure msg ->
+            error msg
+
+        _ ->
+            loading


 tabView : State -> ( State, List a ) -> Html Msg

Using lazy2 actually made a thunk that it's not evaluated. There must smth else that causes the issue. Looks like smth is reevaluating the layout. I will have a look later this week.

IMHO limiting the number of silences displayed when there is no filter (and then, adding pagination) would probably be a better long term solution.

Agreed. Right now we don't have a stable order when returning silences, so pagination won't be useful until that is fixed. We also need to figure out how to handle pagination within the three separate tabs, since they are sorted into these buckets after being received from the API.

Could a first implementation just sort by date and display only 100 results by default (with a message saying that there are more, possibly a link to display all) ?

@stuartnelson3 I鈥檓 on vacation now, so I got plenty of time, will have a look soon.

Was this page helpful?
0 / 5 - 0 ratings