Kibana: Total hits when using expressions

Created on 5 Jul 2019 · 13Comments · Source: elastic/kibana

Earlier when you got access to the raw Elasticsearch response you had a possibility to grep out the hits.total amount from the response, thus seeing how many documents matched the query, including ones that are not returned in the result, meaning this number cannot be calculated from any other part of the response.

When using esaggs or esdocs you just get a table of documents, which doesn't have that information available (and it cannot be calculated from other information in the table).

There are scenarios where a consumer might want to have that information. This issue was pointed out by Fabien who maintains the Enhanced Table plugin for Kibana and used that information. I can still think of potential other use-cases inside Kibana: In Discover we discussed using esdocs to load documents, but we also need to show the hits.total on top.

I see a couple of options how this could work with (or without) expressions:

Option A: We don't have that information in the datatable type in the expression thus a consumer who wants that information needs to do a second request to fetch the count (e.g. via another escount expression or in any other way). This has the drawbacks of a) being less performant, since we need to do a complete separate request, which needs to do a lot of the same calculations in ES (this could really hurt large clusters) and b) more complexity when building an application that needs that information.

Option B: Attach that information into a row or column of the (kibana_)datatable. This feels just wrong to me, since that information is "meta information" about the table, it's not related to individual rows (so shouldn't be a column) or is just "another item/document" in your output (so shouldn't be a row). If we would add it as one, despite that the actual semantics are weird, a consumer would also need to somehow find that row/column again (how? are there some magic ids?) to get that information, so it wouldn't e.g. show up in discover as a document accidentally.

Option C: Give the datatable some "metadata" (as we discussed for columns already for formatters or field information) which can be used to store that hits.total value. The advantages here are that it would not break anything existing, but still give access. The drawbacks are that we're adding yet another arbitrary meta information to a core datatype in the expressions, thus making handling of that datatype potentially more complex.

Option D: Require anyone who needs access to hits.total not to use expressions, but instead use Courier (or the new Data Access Layer) directly, which would expose that information. This has the advantage, that it all can be done in one request as today without any performance overheads. The drawbacks are of course that we require suddenly some visualizations (like Enhanced Table, and I think our new Data table will require the same information) to not use the expressions anymore, which won't even work in Lens later on, because it's completely build around rendering via expressions. Also we couldn't use it in some places like Discover then.

To be honest I currently don't like any of the options above, but can't imagine any other way to get it working right now, so I am happy for any alternative suggestions. If I would need to pick one of the above, I have a slight tendency to Option C, though I don't really like it either, given its drawbacks.

cc @lukeelmers @ppisljar @fbaligand

ExpressionLanguage AppServices discuss

Source

timroes

👍1

Most helpful comment

We synced about that offline and came up with the following plan:

There might be some legit reasons for Option A, where you want to do a separate request to retrieve different data, but we should not try to stress that for this use-case where you could already have the data available via the first request and otherwise need a potential expensive 2nd request.
We will in the first step go with a slightly modified Option C. Instead of adding a generic metadata field, that we fear might be cluttered very easily with any arbitrary values in the future, that not every expression function that works on the datatable might know about or treat properly, we actually create for that use-case now a dedicated field in the datatable. @clintandrewhall suggested naming it statistics, so we would have in the datatable a statistics.totalCount property which data functions should fill. We think that most data providing functions will be able to fill this in, if not we can leave it as null.
Every function that works on datatable can now, by its own discretion, keep that value, modify it or throw it away, whatever makes the most sense for it.
We discussed an approach, that if we need more metadata in the future, we could just assign a UUID to the datatable and offer a separate API, that will allow you to fetch additional meta data for that UUID. That will allow us to be able to optimize whether or not we need to do an additional request behind that API. Since we currently don't have a specific use-case for that API, we will not yet build this.

We will build that statistics.totalCount into esaggs (first) within the upcoming month, most likely during our Discover refactoring.

timroes on 11 Jul 2019

👍2

All 13 comments

Pinging @elastic/kibana-app-arch

elasticmachine on 5 Jul 2019

Thanks @timroes to have opened an issue on that!

fbaligand on 5 Jul 2019

Well, after I read carefully your 4 options, I vote for option C.
IMHO, all other options have strong drawbacks, compared to option C.

fbaligand on 6 Jul 2019

👍1

Just a note, this also applies to other things in the response, like _shards, took, etc.

lukasolson on 9 Jul 2019

For other things, I think that being able to define a custom request handler that customizes « courier » response is enough.
Out of « courier » request handler, you get both datatable and raw elasticsearch response.
So you’re able to do whatever you want.

That said, currently, it seems to be not possible to directly call « courier » request handler from a plugin’s custom request handler.
So it would be great to make it possible.

fbaligand on 9 Jul 2019

As I said in the meeting today, I'm in favor of both A and C.

It is likely a foregone conclusion that the collection of metadata can only grow as we make metadata more easily available. We also run the risk of collecting and returning metadata in which the requestor has no interest.

I would propose strictly limiting metadata to a small collection of statistics, (perhaps even calling the collection stats) that are returned with the datatable. I would then propose we make any other metadata available by a subsequent API call using a provided uuid of the request.

This allows us to return clearly useful and common request metadata on the initial call and temporarily store additional metadata for collection/analysis at any time. I could see this proving useful in a number of other scenarios, as well.

We'd have a number of pros:

We'd have clear definitions of "top-tier" and "secondary" metadata;
We'd prevent bloat of top-tier metadata and requests from consumers;
We wouldn't have to return metadata the requestor is not calling for;
We could use the temporarily stored metadata for other purposes beyond that initial call.

To be clear: I'm not proposing the uuid request concept be complete before we start returning hits or even took with our datatables. I'm proposing we have the plan so we can start strictly prioritizing some metadata for now and others for a later API.

I'm a fan of strict contracts and solid, future-looking planning. I think this is a strong compromise.

clintandrewhall on 11 Jul 2019

We synced about that offline and came up with the following plan:

There might be some legit reasons for Option A, where you want to do a separate request to retrieve different data, but we should not try to stress that for this use-case where you could already have the data available via the first request and otherwise need a potential expensive 2nd request.
We will in the first step go with a slightly modified Option C. Instead of adding a generic metadata field, that we fear might be cluttered very easily with any arbitrary values in the future, that not every expression function that works on the datatable might know about or treat properly, we actually create for that use-case now a dedicated field in the datatable. @clintandrewhall suggested naming it statistics, so we would have in the datatable a statistics.totalCount property which data functions should fill. We think that most data providing functions will be able to fill this in, if not we can leave it as null.
Every function that works on datatable can now, by its own discretion, keep that value, modify it or throw it away, whatever makes the most sense for it.
We discussed an approach, that if we need more metadata in the future, we could just assign a UUID to the datatable and offer a separate API, that will allow you to fetch additional meta data for that UUID. That will allow us to be able to optimize whether or not we need to do an additional request behind that API. Since we currently don't have a specific use-case for that API, we will not yet build this.

We will build that statistics.totalCount into esaggs (first) within the upcoming month, most likely during our Discover refactoring.

timroes on 11 Jul 2019

👍2

@timroes
That sounds great!
Maybe it is too early to ask it, but which Kibana version do you plan for this enhancement? 7.3? 7.4?

fbaligand on 11 Jul 2019

@fbaligand Unfortunately we don't yet have a proper estimation on when this will make it, but I can guarantee it's not 7.3 :D (since that is already frozen for us) So I fear you'll need to maintain whatever workaround your currently using for a couple of versions.

timroes on 11 Jul 2019

Okay :)

fbaligand on 11 Jul 2019

@alexh97
Well, what is the explanation behind « issue closed »?

fbaligand on 15 Aug 2019

This was closed by accident due to a bad integration with Zube. Apologies.