Tensorboard: Design a public Python API for the hparams plugin

Created on 11 Mar 2019  Â·  15Comments  Â·  Source: tensorflow/tensorboard

To use the hparams dashboard, users currently have to manually construct
hparams-specific protocol buffers and send them to file
writers (see the tutorial notebook for an example; it takes a few
dozen lines of Python code). The protobuf bindings are not particularly
idiomatic Python, and are less than pleasant to use. We should
investigate possible simplifications to this API.

For example, we could streamline the construction of the ListValues
for the discrete domains by allowing the user to pass Python lists, and
we can also infer the data types from the types of the elements of the
domain* (which also lets us require that the list is homogeneously
typed):

def create_experiment_summary():
  api = hparams_summary  # for brevity
  hparams = [
      api.hparam("num_units", domain=api.discrete([16, 32])),
      api.hparam("dropout_rate", domain=api.interval(min=0.1, max=0.2)),
      api.hparam("optimizer", domain=api.discrete(["adam", "sgd"])),
  ]
  metrics = [
      api.metric("accuracy"),
      api.metric("xent", display_name="cross-entropy"),
  ]
  experiment = api.experiment(hparams=hparams, metrics=metrics)
  experiment.write(logdir=os.path.join("logs", "hparam_tuning"))

This is just a sketch, but it’s already three times shorter than the
current demo without (imho) any loss of utility.

* It’s fine to prohibit empty domains here. If a hyperparameter has
empty domain, then the whole hyperparameter space is empty, so there can
be no runs; thus, allowing empty domains is not actually useful.

hparams feature

Most helpful comment

A few people have expressed confusion about the term “session” as used
by the current hparams API. In TensorFlow 1.x, tf.Session is a core
piece of technical infrastructure for evaluating graphs. In the hparams
API, a “session” is a single run of the model (training plus validation)
with one set of hyperparameter values, but in TensorFlow 1.x this may
correspond to many sess.run() calls, and in TensorFlow 2.x there are
no sessions at all. The two notions of “session” are roughly unrelated.

I propose omitting “session” from new API symbols where feasible, and
using “trial” instead. This is consistent with Vizier’s usage—from §1.2
of the [Vizier paper],

A Trial is a list of parameter values, x, that will lead to a
single evaluation of f(x).

—and, I think, also suggests the correct meaning.

“Session groups” would most literally become “trial groups”, though this
name doesn’t make it obvious that these are specifically groups of
trials with the same hyperparameters (nor did “session groups”). Rather
than just using this literal replacement, we should try to convey the
actual meaning: e.g., instead of asking for a “session group name”, ask
for a “hparams key”.

Even better, though, I think that we can avoid asking for session group
names at all in the common case. Rather than letting the session group
name default to the session name, we should let it default to something
like sha256(str(hparams)). This satisfies the intended behavior of
session groups partitioning the trial space by hyperparameter values,
without requiring any additional user input.

All 15 comments

Thanks for this awesome plugin, this is really a useful addition to Tensorboard.

After using it in my project, I have some comments:

  1. Would it be possible to support int types? There is currently only api_pb2.DATA_TYPE_FLOAT64, I think it would make sense to add api_pb2.DATA_TYPE.INT64.
    It would just be a little better for the display.

  2. The idea above to have intervals (api.interval(min=0.1, max=0.2)) is a great one. I currently have to manually add every value used. It also enable random search and not just grid search. The user interface would have to change a bit to enable users to select ranges when filtering on the left tab

  3. Adding new hyperparameters is not fully supported. If I add a new parameter new and re-create the event file for the HParams tab, the previous runs that don't contain new will all disappear if I choose to display new.
    Ideally I would be able to add values to my old summaries, but I couldn't find a way to do that.
    For instance if I add a batch_size parameter, I would need add the default value 32 to all previous runs.
    In general it would be great to be able to manipulate old summaries easily.

Just some small comments, amazing work overall 🥇

Hi @omoindrot—thanks for writing in, and glad to hear that you like it!

Would it be possible to support int types?

I was wondering the same thing. The parallel coordinates view and the
scatter plot matrix view each display integers without any decimal
points, but it’s true that the table view does show superfluous decimal
points. Patching the table view to be consistent with the other views
would certainly help; it might still be reasonable to also add explicit
support for int data.

The idea above to have intervals […] is a great one

This should already be supported:

https://github.com/tensorflow/tensorboard/blob/25dc3e8ceb78ff8867e4adbea06a590089dacc9d/tensorboard/plugins/hparams/api.proto#L90-L96

…though our tutorials don’t use it and it’s only mentioned in a proto
definition, so it makes sense that people don’t know about it. One
benefit of having a proper Python API here (api.interval) is that we
can more visibly document things like this. :-)

Adding new hyperparameters is not fully supported. […]

Great point; I’ve opened #2014 to track this.

@wchargin can we add int64 dtype in DataType(present in api.proto) as requested by @omoindrot

A few people have expressed confusion about the term “session” as used
by the current hparams API. In TensorFlow 1.x, tf.Session is a core
piece of technical infrastructure for evaluating graphs. In the hparams
API, a “session” is a single run of the model (training plus validation)
with one set of hyperparameter values, but in TensorFlow 1.x this may
correspond to many sess.run() calls, and in TensorFlow 2.x there are
no sessions at all. The two notions of “session” are roughly unrelated.

I propose omitting “session” from new API symbols where feasible, and
using “trial” instead. This is consistent with Vizier’s usage—from §1.2
of the [Vizier paper],

A Trial is a list of parameter values, x, that will lead to a
single evaluation of f(x).

—and, I think, also suggests the correct meaning.

“Session groups” would most literally become “trial groups”, though this
name doesn’t make it obvious that these are specifically groups of
trials with the same hyperparameters (nor did “session groups”). Rather
than just using this literal replacement, we should try to convey the
actual meaning: e.g., instead of asking for a “session group name”, ask
for a “hparams key”.

Even better, though, I think that we can avoid asking for session group
names at all in the common case. Rather than letting the session group
name default to the session name, we should let it default to something
like sha256(str(hparams)). This satisfies the intended behavior of
session groups partitioning the trial space by hyperparameter values,
without requiring any additional user input.

Some thoughts: even having a separate "trial/trial group" concept seems a little unwieldy to me. What if we just called them "runs" to use the terminology from the rest of TensorBoard?

The mapping is not exact, but I think in the common cases the concepts do align, and it seems better to me to share terminology in the common cases than introduce new terms just to be slightly more exact in the less common cases. In the long run, it would make sense for "runs" to be more conceptually defined anyway - the "subdirectory with event files in it" definition doesn't apply to database-first summaries, for example.

Off the top of my head, the cases where trials don't exactly match up with runs are:

  1. Runs might not record any session_start_pb summary and hence won't show up in the hparams dashboard - this seems fine to me, they're just not "hparams-enabled runs"

  2. Repeated executions of a given group (i.e. a "trial group") where each execution is a run, rather than the whole collection - these are more of a special case (the actual dashboard doesn't aggregate across these yet anyway). To me it makes more sense for the top-level primitive to be singular - i.e. to just be "trial" or "run", no "group" - because otherwise the likely majority of users who have one trial per trial group are having to deal with extra conceptual overhead of the less common case. Instead, it seems better to me to say that a repeatedly executed hparam combination is a special type of run, e.g. a "repeated run" or a "meta-run" or a "synthetic run" (with its metric values defined according to the specified aggregation rules). And that definition could be generalized to other dashboards to address this longstanding feature request: https://github.com/tensorflow/tensorboard/issues/376

  3. Metrics defined against "sub-runs" of the run that contains the session pb - this also seems like a special case to me; it's basically a workaround for the fact that a single logical run often uses multiple subdirectories to facilitate comparison (e.g. train and eval subdir). And I think in a sense this is just a specific instance of case 1 above - the sub-runs won't define their own session PB summaries, so they just aren't hparams-enabled runs.

If I may throw in my opinion, I have been trying out the API and I totally agree with @wchargin about

“Session groups” would most literally become “trial groups”, though this
name doesn’t make it obvious that these are specifically groups of
trials with the same hyperparameters (nor did “session groups”). Rather
than just using this literal replacement, we should try to convey the
actual meaning: e.g., instead of asking for a “session group name”, ask
for a “hparams key”.

The "sessions" term is very confusing. Trial groups might not be perfect but more intuitive.
What you call a "hparams key", we call a Trial ID for example, which is also a hash of the unique Hyperparameter combination. I see that you want to create these keys/IDs (no matter how you call them) for the user and hide this from him/her. However, in the current public API, I am missing the possibility to set this myself. We have a Trial ID already to identify and track our trials across services in our system and I would like to be able to reuse this own ID. I can imagine other users coming across this in the future, too. Do you think this could be included? It used to be possible in the KerasCallback (hp.KerasCallback(logdir, hparams, group_name=group_id)). Adding it to the summary_v2.hparams and hparams_pb would also help already, so I can write my own callback.

On another note, I will be happy to start contributing to this plugin in the near future, since I am planning to make use of it once 1.14 gets released :)

Hey @moritzmeister: thanks a ton for trying out the new APIs. This
feedback is invaluable.

We can certainly add a group_name/group_id parameter to the Keras
callback and to summary_v2.hparams{,_pb}. I omitted it initially
because it’s easier to add it in later than to remove it, and I wasn’t
sure whether it was actually useful. But linking the IDs to those used
in other systems is totally reasonable and seems like sufficient
justification to me. Will add.

On another note, I will be happy to start contributing to this plugin
in the near future, since I am planning to make use of it once 1.14
gets released :)

Excellent! Looking forward to it. :-)

@moritzmeister: Long delay, I know, but: I’ve just added a trial_id
kwarg to KerasCallback, hparams, and hparams_pb, as requested. The
new forms are in latest nightly; see #2440/#2442.

I’m going to close this issue, since its original purpose has been
completed and I don’t have any more planned changes in the works. Please
feel free to open new issues for any further requests or feedback!

@wchargin Could we also include the run name (folder name) in the summary table too? So that you can easily link the data in the scalars tab with HPARMS

@joshlk: There’s not a one-to-one correspondence; a single trial can
contain multiple runs, with the same hyperparameters but different
random seeds. This is why “trial ID” is a separate concept from “run
name” in the first place.

Do note that you can check the “Show Metrics” box in the table view to
view scalar charts for a trial:

Screenshot of the table view with the “Show Metrics” box checked

@wchargin Do you have an example of multiple runs per trail? How would that work? Do you take the average of the metrics in the display?

@joshlk: Yes, the default is to take the average of each metric
independently. The backend supports options to aggregate by a
metric: e.g., “for each session group, show me the metrics corresponding
to the run with the highest accuracy” or “…the run with the lowest
xent”. But the frontend doesn’t currently expose any UI to enable
this, so from a user’s perspective the behavior that you describe is the
only supported one.

Do you have an example of multiple runs per trail?

The hparams demo script emits two runs per trial. For a trivial
example, you can run the following:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensorboard.plugins.hparams import api as hp
import tensorflow.compat.v2 as tf


__import__("tensorflow").compat.v1.enable_eager_execution()

with tf.summary.create_file_writer("a").as_default():
  hp.hparams({"learning_rate": 0.2})
  tf.summary.scalar("loss", 0.1, step=0)
  tf.summary.scalar("accuracy", 0.8, step=0)

with tf.summary.create_file_writer("b").as_default():
  hp.hparams({"learning_rate": 0.2})
  tf.summary.scalar("loss", 0.3, step=0)
  tf.summary.scalar("accuracy", 0.9, step=0)

Then, launch tensorboard --logdir ., and you’ll see a single trial
with accuracy reported as 0.85 and loss reported as 0.2. If you check
the “Show Metrics” box, you can see charts with data from _both_ runs:

Screenshot of hparams dashboard with a single trial with aggregated<br />
metrics from multiple runs

@wchargin Thanks thats a really useful feature! 👍

I've been using the HParams feature for a couple of days now and it would still be really great to link the Tail ID to the Runs as otherwise the HParams tab is totally disconnected from Scalars.

The HParams tab doesn't have the same overview of all runs like the Scalars tab does. When I look at the results I start from the Scalars tab and identify the runs that are doing well but then its very difficult for me to find those runs in the HParams tab. Maybe it could show the runs IDs when you select show metrics. Or you could filter by run ID on the right side.

@joshlk: Yep; we generally agree. The original vision was that the
hparams dashboard should do something like this. We didn’t get around to
implementing it in the initial version, partly because we’re not
entirely sure where we want to go with it. For instance, one ambitious
approach would be to redefine what a “run” is entirely; currently a run
is basically a directory on disk, but conceptually a run is deeper than
that. In any case, I’ve filed a feature request for tracking: #2465.

Also filed #2464 to track aggregation support.

@wchargin Great - thanks for the work!

Was this page helpful?
0 / 5 - 0 ratings