Kibana version:
5.1
Elasticsearch version:
5.1
Server OS version:
Ubuntu 16.04
Browser version:
Chrome Version 56.0.2924.76 (64-bit)
Browser OS version:
Ubuntu 16.10
Original install method (e.g. download page, yum, from source, etc.):
Deb
Description of the problem including expected versus actual behavior:
Periodically the node stats in Kibana will all drop to zero, which is plainly false. It's still possible to click into the individual nodes themselves and see updates. Restarting kibana fixes this.
Steps to reproduce:
Errors in browser console (if relevant):
None
Provide logs and/or server output (if relevant):
Nothing obvious

Before moving this to xpack, @tsullivan @pickypg have either of you seen this before?
Does it drop to zero and not return again? Do the chart pages show inverted spikes?
If so, it could be that the monitoring agents installed on the nodes are skipping bulk uploads of their monitoring data due to lots of other bulk activity.
Is there a separate monitoring cluster in this architecture, or is the cluster doing self-monitoring?
++ to @tsullivan
The issue here looks like it _may_ be the result of the Elasticsearch agent skipping some activity _or_ it being delayed, so the timing is wrong (note that it shows proper max/min for the range, but the _current_ values are all 0).
My _guess_ is that there is at least one document in the latest bucket, which is skewing these calculations, as @tsullivan is describing. _If_ that's true, 5.3 (unreleased) should make this better due to changes to how the data is scheduled and sent. Alternatively, you can probably change the config/elasticsearch.yml setting:
# defaults to 10s
xpack.monitoring.collection.interval: 9s
This will force it to publish more frequently, which tends to counteract the drift.
@tsullivan
a) Yeah, it drops to zero and doesn't return. Chart pages seem fine; example attached.

b) This is a separate machine but in the same cluster. I.e. we have 3 dedicated masters, 4 data only nodes, 1 monitor node.
@pickypg OK cool, we'll try that setting and I'll report back.
What happens if you look at the last 1 hour and last 15 minute views? (Time picker in the top right is set to "Today").
As a side note, the way that the charts work is by aggregating based on a selected time range. If that time range is around the minimum size (e.g., 10s), but the data does not consistently come in at that rate (e.g., let's say it actually comes in every 11s), then eventually you will run into to gaps in those buckets. By observing a very large time frame, the bucket sizes increase (e.g., 1m) and such holes disappear as a result.
++ @pickypg I didn't notice in your first screenshot, but it seems clearer now, that the issue is very likely that you have Today selected as your time range. The stats on the node listing page take the last bucket in the time frame for their metrics, and all your last buckets are null because the end of the day hasn't happened yet.
Good catch - switching to "last hour" or "day so far" yields numbers. Not a bug. There may be a UX issue though.
Thinking back, the workflow was:
1) Keep this dashboard open in a tab, refreshing/checking it periodically
2) See some relatively high CPU activity on one of the nodes, and click into the detail page to see if there's anything worth investigating
3) Want to see more range on the node, so click "today" (which works in the node detail view)
4) Click back to nodes dashboard and be surprised that everything has reset to 0 (not noticing that the top right for the dashboard had been set to "today")
Thoughts?
Your workflow makes sense to me. I think this is a pretty nasty UX issue and it's one that we actually avoided shipping a feature because of. We have already discussed problems with moving the time range into the "future". I had never witnessed this on the node list with "today" (versus someone manually putting in a future date), but it has the same impact.
We're looking at a fix, but while I have you, I wonder how you feel about it as an end user:
I think #1 is my preference -- but that's what you mean by "today so far" (which I didn't notice until I looked hard for it.) Can you detect from context that "today" doesn't make sense and just offer "today so far" as an option instead?
Can you detect from context that "today" doesn't make sense and just offer "today so far" as an option instead?
Not yet. But I think we're a _relatively_ simple enhancement away from doing it. The major caveat then comes from usage:
midnight to now.midnight to now, but they need to know that they are _dislaying_ until midnight tomorrow.The good news is that both want the same timeframe for calculating, so it may just be a matter of feeding the "end" time in two differently digestable ways.
Writing this down as a brain dump: it's also important to remember that some time picker options are technically in the future, like "Today", while others are future wholly in the future, like "Tomorrow" if it existed or it was specified as an absolute date.
I also faced similar issue, and time sync help me to solve it.
This is fixed in both the listings and charts in 5.6+ (unreleased).