Earlier when you got access to the raw Elasticsearch response you had a possibility to grep out the hits.total amount from the response, thus seeing how many documents matched the query, including ones that are not returned in the result, meaning this number cannot be calculated from any other part of the response.
When using esaggs or esdocs you just get a table of documents, which doesn't have that information available (and it cannot be calculated from other information in the table).
There are scenarios where a consumer might want to have that information. This issue was pointed out by Fabien who maintains the Enhanced Table plugin for Kibana and used that information. I can still think of potential other use-cases inside Kibana: In Discover we discussed using esdocs to load documents, but we also need to show the hits.total on top.
I see a couple of options how this could work with (or without) expressions:
Option A: We don't have that information in the datatable type in the expression thus a consumer who wants that information needs to do a second request to fetch the count (e.g. via another escount expression or in any other way). This has the drawbacks of a) being less performant, since we need to do a complete separate request, which needs to do a lot of the same calculations in ES (this could really hurt large clusters) and b) more complexity when building an application that needs that information.
Option B: Attach that information into a row or column of the (kibana_)datatable. This feels just wrong to me, since that information is "meta information" about the table, it's not related to individual rows (so shouldn't be a column) or is just "another item/document" in your output (so shouldn't be a row). If we would add it as one, despite that the actual semantics are weird, a consumer would also need to somehow find that row/column again (how? are there some magic ids?) to get that information, so it wouldn't e.g. show up in discover as a document accidentally.
Option C: Give the datatable some "metadata" (as we discussed for columns already for formatters or field information) which can be used to store that hits.total value. The advantages here are that it would not break anything existing, but still give access. The drawbacks are that we're adding yet another arbitrary meta information to a core datatype in the expressions, thus making handling of that datatype potentially more complex.
Option D: Require anyone who needs access to hits.total not to use expressions, but instead use Courier (or the new Data Access Layer) directly, which would expose that information. This has the advantage, that it all can be done in one request as today without any performance overheads. The drawbacks are of course that we require suddenly some visualizations (like Enhanced Table, and I think our new Data table will require the same information) to not use the expressions anymore, which won't even work in Lens later on, because it's completely build around rendering via expressions. Also we couldn't use it in some places like Discover then.
To be honest I currently don't like any of the options above, but can't imagine any other way to get it working right now, so I am happy for any alternative suggestions. If I would need to pick one of the above, I have a slight tendency to Option C, though I don't really like it either, given its drawbacks.
cc @lukeelmers @ppisljar @fbaligand
Pinging @elastic/kibana-app-arch
Thanks @timroes to have opened an issue on that!
Well, after I read carefully your 4 options, I vote for option C.
IMHO, all other options have strong drawbacks, compared to option C.
Just a note, this also applies to other things in the response, like _shards, took, etc.
For other things, I think that being able to define a custom request handler that customizes 芦聽courier聽禄 response is enough.
Out of 芦聽courier聽禄 request handler, you get both datatable and raw elasticsearch response.
So you鈥檙e able to do whatever you want.
That said, currently, it seems to be not possible to directly call 芦聽courier聽禄 request handler from a plugin鈥檚 custom request handler.
So it would be great to make it possible.
As I said in the meeting today, I'm in favor of both A and C.
It is likely a foregone conclusion that the collection of metadata can only grow as we make metadata more easily available. We also run the risk of collecting and returning metadata in which the requestor has no interest.
I would propose strictly limiting metadata to a small collection of statistics, (perhaps even calling the collection stats) that are returned with the datatable. I would then propose we make any other metadata available by a subsequent API call using a provided uuid of the request.
This allows us to return clearly useful and common request metadata on the initial call and temporarily store additional metadata for collection/analysis at any time. I could see this proving useful in a number of other scenarios, as well.
We'd have a number of pros:
To be clear: I'm not proposing the uuid request concept be complete before we start returning hits or even took with our datatables. I'm proposing we have the plan so we can start strictly prioritizing some metadata for now and others for a later API.
I'm a fan of strict contracts and solid, future-looking planning. I think this is a strong compromise.
We synced about that offline and came up with the following plan:
metadata field, that we fear might be cluttered very easily with any arbitrary values in the future, that not every expression function that works on the datatable might know about or treat properly, we actually create for that use-case now a dedicated field in the datatable. @clintandrewhall suggested naming it statistics, so we would have in the datatable a statistics.totalCount property which data functions should fill. We think that most data providing functions will be able to fill this in, if not we can leave it as null.datatable can now, by its own discretion, keep that value, modify it or throw it away, whatever makes the most sense for it.datatable and offer a separate API, that will allow you to fetch additional meta data for that UUID. That will allow us to be able to optimize whether or not we need to do an additional request behind that API. Since we currently don't have a specific use-case for that API, we will not yet build this.We will build that statistics.totalCount into esaggs (first) within the upcoming month, most likely during our Discover refactoring.
@timroes
That sounds great!
Maybe it is too early to ask it, but which Kibana version do you plan for this enhancement? 7.3? 7.4?
@fbaligand Unfortunately we don't yet have a proper estimation on when this will make it, but I can guarantee it's not 7.3 :D (since that is already frozen for us) So I fear you'll need to maintain whatever workaround your currently using for a couple of versions.
Okay :)
@alexh97
Well, what is the explanation behind 芦聽issue closed聽禄?
This was closed by accident due to a bad integration with Zube. Apologies.
No problem!
Thanks for the explanation and the reopen :)
Most helpful comment
We synced about that offline and came up with the following plan:
metadatafield, that we fear might be cluttered very easily with any arbitrary values in the future, that not every expression function that works on thedatatablemight know about or treat properly, we actually create for that use-case now a dedicated field in thedatatable. @clintandrewhall suggested naming itstatistics, so we would have in thedatatableastatistics.totalCountproperty which data functions should fill. We think that most data providing functions will be able to fill this in, if not we can leave it asnull.datatablecan now, by its own discretion, keep that value, modify it or throw it away, whatever makes the most sense for it.datatableand offer a separate API, that will allow you to fetch additional meta data for that UUID. That will allow us to be able to optimize whether or not we need to do an additional request behind that API. Since we currently don't have a specific use-case for that API, we will not yet build this.We will build that
statistics.totalCountintoesaggs(first) within the upcoming month, most likely during our Discover refactoring.