Tensorboard: Tensorboard not displaying all the HParams events

Created on 17 Nov 2019 · 7Comments · Source: tensorflow/tensorboard

This is my logs directory structure :

There are two Jupyter notebooks running in parallel which have the exact same code except for the prefix of the _run-*_ directory. Both of them dump _hparam_tuning_ metrics in the same directory. Both of them train the same model with same hyper-parameters and metrics, but on different data. My requirement is to view all these runs in same table of tensorboard.

EDIT NOTE : Training data in the code gets generated & processed ONLY once in one notebook for all the runs. I cannot read & process the data multiple times for different runs. Also, both these notebooks run on different GPUs, I want to run them in parallel which is why I cannot run them in one notebook since the runs are sequential.

Tensorboard reads only 9 of the 18 hparam runs that I have in my logs directory :

Although I am able to see the scalars which I am using to monitor the loss, which are in the same log directory. Moreover, the metrics for hyper-parameter tuning are also visible for all the runs under the _"Scalar"_ tab, but not under HPARAMS tab.

EDIT NOTE : I have kept the _identifier_ as another hyper-parameter to view hparam logs generated by both the Jupyter notebooks. It's just for filtering purposes, since there is no feature to filter them via trialId.

Is there any way I can merge the results of different _session_ runs in one table?

backend frontend hparams awaiting tensorflower support

Source

yashmanuda

Most helpful comment

wasted 4 hours on this trying to get it to work to no avail.
using tensorboard 2.3.0 and the SummaryWriter that is included in pytorch 1.3.1

this is roughly how I do the logging (in my case all the logging happens not during but after training, which should not make a difference):

rm -rf /tmp/tb_logs/ # lets remove the old logs first

python code:

from torch.utils.tensorboard import SummaryWriter
for identifier, result in trial_results.items(): 
    writer = SummaryWriter(log_dir=f"/tmp/tb_logs/{identifier}")
    for metric_name, val in result.items():  # not sure if this is necessary
        writer.add_scalar(metric_name, val)
    writer.add_hparams({k: v for k, v in trial_hparams[identifier].items()},
                       {f"hparams/{metric_name}" : val for metric_name, val in result.items()})
    writer.close()

I can see that the results are stored:

$ find . -iname "events.out*" | wc -l
216

then I'm visualizing with:
tensorboard --logdir /tmp/tb_logs --reload_multifile=true

but I'm only seeing 5 results in the hparams overview. the scalar overview shows all results, not just the 5 in hparams

here's a small extract of the data I'm using:

trial_results = {
  '0001_0008': {'AverageRecall_vehicle_Valid': 0.540647},
  '0001_0026': {'AverageRecall_vehicle_Valid': 0.535381}
}
trial_hparams = {
  '0001_0008': {'confLossFactr': 0.5, 'confIncByWeakDet': 0, 'confIncByStrongDet': 0, 'confPropIncDetThresh': 0.65}, 
  '0001_0026': {'confLossFactr': 0.7, 'confIncByWeakDet': 0, 'confIncByStrongDet': 0, 'confPropIncDetThresh': 0.325}
}

dominikdienlin on 15 Oct 2020

👍2

All 7 comments

Can you try passing the new --reload_multifile=true flag to TensorBoard and see if that addreses the issue? You'll need version 2.0.0 or greater.

Without this flag, it's usually a bad idea to write data concurrently to multiple event files (which it looks like you're doing since both jupyter notebooks are writing to the top-level log directory, and there are two events.out.tfevents.* files shown there). TensorBoard historically only reads new data from the last event file in each directory, which means it won't see new data added to the earlier file. Passing --reload_multifile=true should eliminate that possible pitfall.

nfelt on 19 Nov 2019

--reload_multifile=true is not working. It's picking only one event file. Is there anything else that needs to be done?

yashmanuda on 20 Nov 2019

Can you provide a bit more detail on what you mean by "picking only one event file"? Also, can you provide the exact command you're using to launch TensorBoard and if possible the rest of the information in our diagnosis script, as requested in our issue template? https://github.com/tensorflow/tensorboard/blob/master/.github/ISSUE_TEMPLATE/bug_report.md

If you're able to provide a copy of the actual event files as well (e.g. as a .zip), that would also be helpful.

nfelt on 21 Nov 2019

wasted 4 hours on this trying to get it to work to no avail.
using tensorboard 2.3.0 and the SummaryWriter that is included in pytorch 1.3.1

this is roughly how I do the logging (in my case all the logging happens not during but after training, which should not make a difference):

rm -rf /tmp/tb_logs/ # lets remove the old logs first

python code:

from torch.utils.tensorboard import SummaryWriter
for identifier, result in trial_results.items(): 
    writer = SummaryWriter(log_dir=f"/tmp/tb_logs/{identifier}")
    for metric_name, val in result.items():  # not sure if this is necessary
        writer.add_scalar(metric_name, val)
    writer.add_hparams({k: v for k, v in trial_hparams[identifier].items()},
                       {f"hparams/{metric_name}" : val for metric_name, val in result.items()})
    writer.close()

I can see that the results are stored:

$ find . -iname "events.out*" | wc -l
216

then I'm visualizing with:
tensorboard --logdir /tmp/tb_logs --reload_multifile=true

but I'm only seeing 5 results in the hparams overview. the scalar overview shows all results, not just the 5 in hparams

here's a small extract of the data I'm using:

trial_results = {
  '0001_0008': {'AverageRecall_vehicle_Valid': 0.540647},
  '0001_0026': {'AverageRecall_vehicle_Valid': 0.535381}
}
trial_hparams = {
  '0001_0008': {'confLossFactr': 0.5, 'confIncByWeakDet': 0, 'confIncByStrongDet': 0, 'confPropIncDetThresh': 0.65}, 
  '0001_0026': {'confLossFactr': 0.7, 'confIncByWeakDet': 0, 'confIncByStrongDet': 0, 'confPropIncDetThresh': 0.325}
}

dominikdienlin on 15 Oct 2020

👍2

Ok so this took me a while.
I have a mix of old and new logs with different number of hparams. The additional hparams add later did not show up, even after restart tensorboard multiple times.
I have to delete all the logs and restart tensorboard to see all the hparams.

lkhphuc on 16 Oct 2020

👍1

As you can see in my post above, I've deleted the log folder and still only seeing a fraction of the hyperparameters

dominikandreas on 16 Oct 2020

Hey, I have the same issue.
In my case, I’m writing sequential into the same folder. This means in a separate run method after the training and retraining of my model.

def run1(run_dir, hparams, …):
with tf.summary.create_file_writer(run_dir).as_default() as writer:
hp.hparams(hparams)
loss_fine, accuracy_fine = train_test_model (hparams, …)
tf.summary.scalar(METRIC_ACCURACY, accuracy, step=1)
tf.summary.scalar(METRIC_LOSS, loss, step=1)
writer.close()
…
def run2(run_dir, hparams, …):
with tf.summary.create_file_writer(run_dir).as_default() as writer:
hp.hparams(hparams)
loss_fine_tuning, accuracy_fine_tuning = train_test_model_fune_tuning(hparams, …)
tf.summary.scalar(METRIC_ACCURACY_FINE_TUNING, accuracy_fine_tuning, step=1)
tf.summary.scalar(METRIC_LOSS_FINE_TUNING, loss_fine_tuning, step=1)
writer.close()

The result of my folder structure (Screenshot jupyter notebook)