Envoy: Simplify /stats output for empty histograms

Created on 25 Apr 2018  路  18Comments  路  Source: envoyproxy/envoy

Original author @ramaraochavali reviewers @mattklein123 @jmarantz @mrice32

https://github.com/envoyproxy/envoy/pull/3130 causes output like:

server.version: 11899839
server.watchdog_mega_miss: 0
server.watchdog_miss: 0
stats.overflow: 0
cluster.service1.upstream_cx_connect_ms: P0(nan,nan) P25(nan,nan) P50(nan,nan) P75(nan,nan) P90(nan,nan) P95(nan,nan) P99(nan,nan) P99.9(nan,nan) P100(nan,nan)
cluster.service1.upstream_cx_length_ms: P0(nan,nan) P25(nan,nan) P50(nan,nan) P75(nan,nan) P90(nan,nan) P95(nan,nan) P99(nan,nan) P99.9(nan,nan) P100(nan,nan)
cluster.service2.upstream_cx_connect_ms: P0(nan,nan) P25(nan,nan) P50(nan,nan) P75(nan,nan) P90(nan,nan) P95(nan,nan) P99(nan,nan) P99.9(nan,nan) P100(nan,nan)
cluster.service2.upstream_cx_length_ms: P0(nan,nan) P25(nan,nan) P50(nan,nan) P75(nan,nan) P90(nan,nan) P95(nan,nan) P99(nan,nan) P99.9(nan,nan) P100(nan,nan)
http.admin.downstream_cx_length_ms: P0(nan,nan) P25(nan,nan) P50(nan,nan) P75(nan,nan) P90(nan,nan) P95(nan,nan) P99(nan,nan) P99.9(nan,nan) P100(nan,nan)
http.admin.downstream_rq_time: P0(nan,nan) P25(nan,nan) P50(nan,nan) P75(nan,nan) P90(nan,nan) P95(nan,nan) P99(nan,nan) P99.9(nan,nan) P100(nan,nan)
http.ingress_http.downstream_cx_length_ms: P0(nan,nan) P25(nan,nan) P50(nan,nan) P75(nan,nan) P90(nan,nan) P95(nan,nan) P99(nan,nan) P99.9(nan,nan) P100(nan,nan)
http.ingress_http.downstream_rq_time: P0(nan,nan) P25(nan,nan) P50(nan,nan) P75(nan,nan) P90(nan,nan) P95(nan,nan) P99(nan,nan) P99.9(nan,nan) P100(nan,nan)
listener.0.0.0.0_80.downstream_cx_length_ms: P0(nan,nan) P25(nan,nan) P50(nan,nan) P75(nan,nan) P90(nan,nan) P95(nan,nan) P99(nan,nan) P99.9(nan,nan) P100(nan,nan)
listener.admin.downstream_cx_length_ms: P0(nan,nan) P25(nan,nan) P50(nan,nan) P75(nan,nan) P90(nan,nan) P95(nan,nan) P99(nan,nan) P99.9(nan,nan) P100(nan,nan)

The commit before that is fine https://github.com/envoyproxy/envoy/commit/fefb0d39ed54c6a43b652d7dd665c747537704ea

enhancement help wanted

Most helpful comment

@ramaraochavali IMO I would probably have the default of both non-JSON and JSON be to output all stats for consistency, and then document the usedonly query param which can be used for both? IMO it's better to be consistent here but I don't feel that strongly about it. cc @mrice32 @jmarantz

All 18 comments

This is by design. It's new histogram stats.

My apologies for the noise!

It's not broken per se, but @ramaraochavali I wonder if we should render empty histograms differently for aesthetic reasons. WDYT?

Do you have any ideas on how empty histograms should be displayed? Should we skip displaying them similar to how we skip sending to stats sink if they are not used?

That would work for me, or you could just put in a single string "no histogram data" in place of all the NaN entries.

If some of the entrires are NaN, should you omit those quantiles from the short display? I'm OK either way, but I actually don't understand, as a user, the distinction between NaN and 0 in this context. Can you explain?

NaN indicates the histogram was not updated at all . 0 means histogram was updated and calculated quantile value is actually 0. If a value inserted in to histogram, you will not get Nan for any of the quantiles.

SG. Then I recommend special-casing the all-entries-NaN case and just indicate the histogram is empty with a short string. Only issue is whether that might break scripts that want to to scrape the histogram data. So another option is omit those histograms completely, which would also be OK with me.

I'd recommend re-opening this issue but I don't have the bits to do that.

@mattklein123 with this change only histograms would be displayed in /stats. We also made similar change to json output for histograms. Do you think counters and gauges should also be printed (in admin as well json) only if they are used?

Do you think counters and gauges should also be printed (in admin as well json) only if they are used?

Personally, I find printing all stats in /stats to be a very useful debugging tool and I would prefer that they not be removed. I could see an argument for removing them from the JSON output which is generally more machine consumption. I could also see an argument for having a query param arg for excluding unused stats? Thoughts? cc @mrice32 @jmarantz

I agree: there is information in the existence of an empty histogram, so eliding the entry entirely obscures that information.

Can we just print "empty histogram" or similar, in our stats output, instead of a bunch of NaNs when we are outputting one of those?

also: a query-param to exclude unused stats would be fine with me.

@mattklein123 this bug is closed now; should a new one be opened, or this one re-opened?

My main point is around JSON - I think JSON should work similar to sinks which exclude used counters and gauges. So I think we should do the same for JSON. If every one agrees, I will make that change as a separate bug.
On the /stats we should just continue to print all of them. If you think that additional query param will simplify, I can make that change as well.

I think printing gauges and counters in the JSON would be useful, and +1 to the idea of a query param to eliminate unused stats. I also agree that "empty histogram" or similar would be better than all the NaNs (often seeing a bunch of NaNs in an output makes me think that something is broken :) ).

Reopening. @ramaraochavali can you track further changes as part of this issue? What everyone is proposing here SGTM. I mainly just want all stats to be output when I hit the non-JSON version of /stats. That's all I really care about. :)

Ok. I will make the changes. So here is the summary of changes

  • /stats interface would list all stats irrespective of whether they have been used or not. If a histogram is not used it will display "No recorded values"
  • JSON output will only output used counters and gauges.
  • /stats?usedonly=true will print only used stats.

@ramaraochavali IMO I would probably have the default of both non-JSON and JSON be to output all stats for consistency, and then document the usedonly query param which can be used for both? IMO it's better to be consistent here but I don't feel that strongly about it. cc @mrice32 @jmarantz

@mattklein123 ok. I will make change accordingly.

Was this page helpful?
0 / 5 - 0 ratings