Tensorboard: Embeddings tab: "parsing metadata" hangs

Created on 20 Jul 2017  路  9Comments  路  Source: tensorflow/tensorboard

I am trying to visualize embeddings using tensorboard but the embedding tab seems to hang on "parsing metadata".
screenshot from 2017-07-20 13-43-23

The code I am using to generate the embeddings and visualization can be found here.

I am running tensorflow 1.2.1 for python 2.7 with gpu support.

Most helpful comment

The metadata_path property within projector_config.pbtxt must be relative to the log directory (logdir). The logdir in this case is named processed. Because projector_config.pbtxt looks like this:

embeddings {
  tensor_name: "embedding:0"
  metadata_path: "processed/vocab_1000.tsv"
}

, the projector plugin attempts to load processed/processed/vocab_1000.tsv, which does not exist. If you set metadata_path to "vocab_1000.tsv", you should be able to successfully load the labels, as I did. Screenshot:

screen shot 2017-07-21 at 12 01 30 am

TensorBoard (the embedding projector frontend) misleadingly halts at the "Parsing metadata" step because it attempts to parse the metadata string of "/home/usr/agent007/Desktop/tf-stanford-tutorials/examples/processed/processed/vocab_1000.tsv" not found, or is not a file, which is obviously not valid metadata content - it's just an error message retrieved from the server.

All in all,

  • You can solve this problem by excluding the logdir from metadata_path (In your code, remove processed/ from embedding.metadata_path = 'processed/vocab_1000.tsv'.).
  • The projector frontend needs to gracefully handle the case in which the tsv file containing metadata cannot be found (instead of misleadingly erring during the parsing stage). I'll create a PR soon that adds a nice message to the embedding projector.

All 9 comments

I checked the code, metadata tsv file, and the projector_config.ptxt against the tensorboard embedding visualization tutorial ; everything seems to be correct and tensorboard is not giving me any messages in the terminal.

@dsmilkov Can you please take a look?

The metadata_path property within projector_config.pbtxt must be relative to the log directory (logdir). The logdir in this case is named processed. Because projector_config.pbtxt looks like this:

embeddings {
  tensor_name: "embedding:0"
  metadata_path: "processed/vocab_1000.tsv"
}

, the projector plugin attempts to load processed/processed/vocab_1000.tsv, which does not exist. If you set metadata_path to "vocab_1000.tsv", you should be able to successfully load the labels, as I did. Screenshot:

screen shot 2017-07-21 at 12 01 30 am

TensorBoard (the embedding projector frontend) misleadingly halts at the "Parsing metadata" step because it attempts to parse the metadata string of "/home/usr/agent007/Desktop/tf-stanford-tutorials/examples/processed/processed/vocab_1000.tsv" not found, or is not a file, which is obviously not valid metadata content - it's just an error message retrieved from the server.

All in all,

  • You can solve this problem by excluding the logdir from metadata_path (In your code, remove processed/ from embedding.metadata_path = 'processed/vocab_1000.tsv'.).
  • The projector frontend needs to gracefully handle the case in which the tsv file containing metadata cannot be found (instead of misleadingly erring during the parsing stage). I'll create a PR soon that adds a nice message to the embedding projector.

Thanks for looking into this @chihuahua. Quick question, how were you able to retrieve the error message from the server? I saw no output in the terminal when running tensorboard and I could not find any kind of dump file in the log directory:

screenshot from 2017-07-21 10-10-09
screenshot from 2017-07-21 10-11-49

Also, the Tensorboard Embedding Visualization Tutorial should be updated to reflect the need to set the metadata_path property relative to the log directory.

@jkarimi91 The version of TensorBoard that you're using doesn't have functioning logging, but we fixed this a while ago in #148; @chihuahua is running from GitHub master, where logging works fine.

Per @wchargin, 400/500 responses like that should be printed by werkzeug following #148.

With some prior knowledge of the code, I set a breakpoint here and found that data equalled that error message.

Indeed, the Tensorboard Embedding Visualization Tutorial should be updated.

If it is fine with everyone, I am marking this issue closed because #259 got submitted. I have also initiated an internal change to the Tensorboard Embedding Visualization Tutorial that removes LOG_DIR from metadata_path.

@chihuahua The documentation still includes the LOGDIR: https://www.tensorflow.org/programmers_guide/embedding

@chihuahua This Word2Vec tutorial also includes the logdir. Could you please remove it?

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/word2vec/word2vec_basic.py

Was this page helpful?
0 / 5 - 0 ratings