Tensorboard: TF 2.0 API for using the embedding projector

Created on 26 Jul 2019  Â·  15Comments  Â·  Source: tensorflow/tensorboard

Preparing embeddings for projector with tensorflow2.

tensorflow1 code would look something like that:

embeddings = tf.compat.v1.Variable(latent_data, name='embeddings')
CHECKPOINT_FILE = TENSORBOARD_DIR + '/model.ckpt'
# Write summaries for tensorboard
with tf.compat.v1.Session() as sess:
    saver = tf.compat.v1.train.Saver([embeddings])
    sess.run(embeddings.initializer)
    saver.save(sess, CHECKPOINT_FILE)
    config = projector.ProjectorConfig()
    embedding = config.embeddings.add()
    embedding.tensor_name = embeddings.name
    embedding.metadata_path = TENSORBOARD_METADATA_FILE

projector.visualize_embeddings(tf.summary.FileWriter(TENSORBOARD_DIR), config)

when using eager mode in tensorflow2 this should (?) look somehow like this:

embeddings = tf.Variable(latent_data, name='embeddings')
CHECKPOINT_FILE = TENSORBOARD_DIR + '/model.ckpt'
ckpt = tf.train.Checkpoint(embeddings=embeddings)
ckpt.save(CHECKPOINT_FILE)

config = projector.ProjectorConfig()
embedding = config.embeddings.add()
embedding.tensor_name = embeddings.name
embedding.metadata_path = TENSORBOARD_METADATA_FILE

writer = tf.summary.create_file_writer(TENSORBOARD_DIR)
projector.visualize_embeddings(writer, config)

however, there are 2 issues:

  • the writer created with tf.summary.create_file_writer does not have the function get_logdir() required by projector.visualize_embeddings, a simple workaround is to patch the visualize_embeddings function to take the logdir as parameter.
  • the checkpoint format has changed, when reading the checkpoint with load_checkpoint (which seems to be the tensorboard way of loading the file), the variable names change. e.g. embeddings changes to something like embeddings/.ATTRIBUTES/VARIABLE_VALUE (also there are additional variables in the map extracted by get_variable_to_shape_map()but they are empty anyways).

the second issue was solved with the following quick-and-dirty workaround (and logdir is now a parameter of visualize_embeddings())

embeddings = tf.Variable(latent_data, name='embeddings')
CHECKPOINT_FILE = TENSORBOARD_DIR + '/model.ckpt'
ckpt = tf.train.Checkpoint(embeddings=embeddings)
ckpt.save(CHECKPOINT_FILE)

reader = tf.train.load_checkpoint(TENSORBOARD_DIR)
map = reader.get_variable_to_shape_map()
key_to_use = ""
for key in map:
    if "embeddings" in key:
        key_to_use = key

config = projector.ProjectorConfig()
embedding = config.embeddings.add()
embedding.tensor_name = key_to_use
embedding.metadata_path = TENSORBOARD_METADATA_FILE

writer = tf.summary.create_file_writer(TENSORBOARD_DIR)
projector.visualize_embeddings(writer, config,TENSORBOARD_DIR)

I did not find any examples on how to use tensorflow2 to directly write the embeddings for tensorboard, so I am not sure if this is the right way, but if it is, then those two issues would need to be addressed.

dump of diagnose_tensorboard.py

Diagnostics


Diagnostics output

``````
--- check: autoidentify
INFO: diagnose_tensorboard.py version 393931f9685bd7e0f3898d7dcdf28819fef54c43

--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=7, micro=3, releaselevel='final', serial=0)
INFO: os.name: posix
INFO: os.uname(): posix.uname_result(sysname='Darwin', nodename='MBPT', release='18.6.0', version='Darwin Kernel Version 18.6.0: Thu Apr 25 23:16:27 PDT 2019; root:xnu-4903.261.4~2/RELEASE_X86_64', machine='x86_64')
INFO: sys.getwindowsversion(): N/A

--- check: package_management
INFO: has conda-meta: True
INFO: $VIRTUAL_ENV: None

--- check: installed_packages
INFO: installed: tb-nightly==1.14.0a20190603
INFO: installed: tensorflow==2.0.0b1
INFO: installed: tf-estimator-nightly==1.14.0.dev2019060501

--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '1.14.0a20190603'

--- check: tensorflow_python_version
INFO: tensorflow.__version__: '2.0.0-beta1'
INFO: tensorflow.__git_version__: 'v2.0.0-beta0-16-g1d91213fe7'

--- check: tensorboard_binary_path
INFO: which tensorboard: b'/USER_DIR/anaconda3/envs/TF20/bin/tensorboard\n'

--- check: readable_fqdn
INFO: socket.getfqdn(): '104.1.168.192.in-addr.arpa'

--- check: stat_tensorboardinfo
INFO: directory: /var/folders/zv/0ywdhk0s55q2770ygg2xbty40000gn/T/.tensorboard-info
INFO: .tensorboard-info directory does not exist

--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['/USER_DIR/anaconda3/envs/TF20/lib/python3.7/site-packages']; bad_roots (0): []

--- check: full_pip_freeze
INFO: pip freeze --all:
absl-py==0.7.1
astor==0.8.0
certifi==2019.6.16
gast==0.2.2
google-pasta==0.1.7
grpcio==1.22.0
h5py==2.9.0
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
Markdown==3.1.1
numpy==1.16.4
pandas==0.25.0
pip==19.2.1
protobuf==3.9.0
python-dateutil==2.8.0
pytz==2019.1
setuptools==41.0.1
six==1.12.0
tb-nightly==1.14.0a20190603
tensorflow==2.0.0b1
termcolor==1.1.0
tf-estimator-nightly==1.14.0.dev2019060501
Werkzeug==0.15.5
wheel==0.33.4
wrapt==1.11.2

``````

projector tf-2.0 docs feature

Most helpful comment

Adding my two cents. Hopefully it will save some people from frustration. This is how I made the Tensorboard Projector show my embeddings in both TF2.0 and TF2.1 in both Non-Eager and Eager execution modes.

I have created a Variant A which runs in Non-Eager mode and Variant B which runs in Eager mode. I will also present two more Variant C and Variant D, which I hoped would work but they do not. Maybe someone can point me to the reason why.

# Some initial code which is the same for all the variants
import os
import numpy as np
import tensorflow as tf
from tensorboard.plugins import projector

def register_embedding(embedding_tensor_name, meta_data_fname, log_dir):
    config = projector.ProjectorConfig()
    embedding = config.embeddings.add()
    embedding.tensor_name = embedding_tensor_name
    embedding.metadata_path = meta_data_fname
    projector.visualize_embeddings(log_dir, config)

def get_random_data(shape=(100,100)):
    x = np.random.rand(*shape)
    y = np.random.randint(low=0, high=2, size=shape[0])
    return x, y

def save_labels_tsv(labels, filepath, log_dir):
    with open(os.path.join(log_dir, filepath), 'w') as f:
        for label in labels:
            f.write('{}\n'.format(label))

LOG_DIR = 'tmp'  # Tensorboard log dir
META_DATA_FNAME = 'meta.tsv'  # Labels will be stored here
EMBEDDINGS_TENSOR_NAME = 'embeddings'
EMBEDDINGS_FPATH = os.path.join(LOG_DIR, EMBEDDINGS_TENSOR_NAME + '.ckpt')
STEP = 0

x, y = get_random_data((100,100))
register_embedding(EMBEDDINGS_TENSOR_NAME, META_DATA_FNAME, LOG_DIR)
save_labels_tsv(y, META_DATA_FNAME, LOG_DIR)

VARIANT A (Works in TF2.0 and TF2.1, but not in eager mode)

# Size of files created on disk: 163kB
tf.compat.v1.disable_eager_execution()
tensor_embeddings = tf.Variable(x, name=EMBEDDINGS_TENSOR_NAME)
sess = tf.compat.v1.InteractiveSession()
sess.run(tf.compat.v1.global_variables_initializer())
saver = tf.compat.v1.train.Saver()
saver.save(sess, EMBEDDINGS_FPATH, STEP)
sess.close()

VARIANT B (Works in both TF2.0 and TF2.1 in Eager mode)

# Size of files created on disk: 80.5kB
tensor_embeddings = tf.Variable(x, name=EMBEDDINGS_TENSOR_NAME)
saver = tf.compat.v1.train.Saver([tensor_embeddings])  # Must pass list or dict
saver.save(sess=None, global_step=STEP, save_path=EMBEDDINGS_FPATH)

VARIANT C (Does not work in TF2.0 or TF2.1, Projector tab is active but no data is displayed)

# Size of files created on disk: 80.8kB
tensor_embeddings = tf.Variable(x, name=EMBEDDINGS_TENSOR_NAME)
checkpoint = tf.train.Checkpoint(embeddings=tensor_embeddings)
checkpoint.save(EMBEDDINGS_FPATH)

VARIANT D (Does not work in both TF2.0 and TF2.1, Projector tab is inactive, No checkpoint was found)

# Size of files created on disk: 80.4kB
tensor_embeddings = tf.Variable(x, name=EMBEDDINGS_TENSOR_NAME)
writer = tf.summary.create_file_writer(LOG_DIR)
with writer.as_default():
    tf.summary.write(tag='projector', tensor=tensor_embeddings,
                     step=STEP, name=EMBEDDINGS_TENSOR_NAME)

It would be great, if this was simplified in new versions of TF. Something like this would be cool:

# WARNING this is purely fictional code :)
writer = tf.summary.create_file_writer(LOG_DIR)
with writer.as_default():
    tf.summary.projector(tensor=tensor_embeddings, labels=tensor_labels,
                     step=STEP, name='desired name')

I understand Tensorflow & Tensorboard team probably have to deal with more important issues, but I must say the lack of documentation on this matter is disturbing. Even the current documentation is sometimes misleading, e.g. pointing out that tf.train.Saver() is deprecated and tf.train.Checkpoint() should be used (with no example). But I just could not make it work. @asitplus-pteufl in the answer at the top shows and example where it works with Checkpoint(), but it works only partially and it requires workarounds. Thanks for that anyways, you have pointed me in a good direction.

All 15 comments

Hi @asitplus-pteufl! Thank you for the detailed and clear report, and
for the background investigation. This is super helpful.

We can definitely update visualize_embeddings such that it doesn’t
need a TF 1.x FileWriter. Because the function only actually needs the
logdir, we’ll probably just let it take the logdir instead of the 1.x
writer (while still accepting the writer for backward compatibility).

The tensor name change is a bit trickier. I looked into this a bit, and
it’s due to changes in the TensorFlow checkpoint code itself, as you
suggest; the tokens .ATTRIBUTES and VARIABLE_VALUE are
hard-coded, and you need to supply them on any checkpoint-related
methods (e.g., read_tensor). And even if you manually specify the
munged name as in your “quick and dirty” workaround, the current
projector frontend will render that munged name rather than just the
variable name.

I’ll do some more digging and see what the best path forward is.

Would be great to have a clean tutorial on how to use TensorBoard projector with TensorFlow 2.0 !

Hi everyone,
thanks for those updates.
However, i still do not understand how to use the tensorboard projector module with tf.estimator as we could do it simply with tf.compat.v1.keras.callbacks.TensorBoard
Is there a way to simply setup tensorboard projection like this for tf.estimators with the train_and_evaluate approach ?
Currently i have to create glue code to subsample the validation data, push into queues to write the projected data at the end of the validation session... not that maintainable and easy to review.
Thanks.

Has anyone solve this or have a working example. I have created a separate environment with TensorFlow 1.14 just to visualize my embeddings. Looking forward to a working example

How do i import projector class in tensorflow 2.0?

How do i import projector class in tensorflow 2.0?

from tensorboard.plugins import projector

Adding my two cents. Hopefully it will save some people from frustration. This is how I made the Tensorboard Projector show my embeddings in both TF2.0 and TF2.1 in both Non-Eager and Eager execution modes.

I have created a Variant A which runs in Non-Eager mode and Variant B which runs in Eager mode. I will also present two more Variant C and Variant D, which I hoped would work but they do not. Maybe someone can point me to the reason why.

# Some initial code which is the same for all the variants
import os
import numpy as np
import tensorflow as tf
from tensorboard.plugins import projector

def register_embedding(embedding_tensor_name, meta_data_fname, log_dir):
    config = projector.ProjectorConfig()
    embedding = config.embeddings.add()
    embedding.tensor_name = embedding_tensor_name
    embedding.metadata_path = meta_data_fname
    projector.visualize_embeddings(log_dir, config)

def get_random_data(shape=(100,100)):
    x = np.random.rand(*shape)
    y = np.random.randint(low=0, high=2, size=shape[0])
    return x, y

def save_labels_tsv(labels, filepath, log_dir):
    with open(os.path.join(log_dir, filepath), 'w') as f:
        for label in labels:
            f.write('{}\n'.format(label))

LOG_DIR = 'tmp'  # Tensorboard log dir
META_DATA_FNAME = 'meta.tsv'  # Labels will be stored here
EMBEDDINGS_TENSOR_NAME = 'embeddings'
EMBEDDINGS_FPATH = os.path.join(LOG_DIR, EMBEDDINGS_TENSOR_NAME + '.ckpt')
STEP = 0

x, y = get_random_data((100,100))
register_embedding(EMBEDDINGS_TENSOR_NAME, META_DATA_FNAME, LOG_DIR)
save_labels_tsv(y, META_DATA_FNAME, LOG_DIR)

VARIANT A (Works in TF2.0 and TF2.1, but not in eager mode)

# Size of files created on disk: 163kB
tf.compat.v1.disable_eager_execution()
tensor_embeddings = tf.Variable(x, name=EMBEDDINGS_TENSOR_NAME)
sess = tf.compat.v1.InteractiveSession()
sess.run(tf.compat.v1.global_variables_initializer())
saver = tf.compat.v1.train.Saver()
saver.save(sess, EMBEDDINGS_FPATH, STEP)
sess.close()

VARIANT B (Works in both TF2.0 and TF2.1 in Eager mode)

# Size of files created on disk: 80.5kB
tensor_embeddings = tf.Variable(x, name=EMBEDDINGS_TENSOR_NAME)
saver = tf.compat.v1.train.Saver([tensor_embeddings])  # Must pass list or dict
saver.save(sess=None, global_step=STEP, save_path=EMBEDDINGS_FPATH)

VARIANT C (Does not work in TF2.0 or TF2.1, Projector tab is active but no data is displayed)

# Size of files created on disk: 80.8kB
tensor_embeddings = tf.Variable(x, name=EMBEDDINGS_TENSOR_NAME)
checkpoint = tf.train.Checkpoint(embeddings=tensor_embeddings)
checkpoint.save(EMBEDDINGS_FPATH)

VARIANT D (Does not work in both TF2.0 and TF2.1, Projector tab is inactive, No checkpoint was found)

# Size of files created on disk: 80.4kB
tensor_embeddings = tf.Variable(x, name=EMBEDDINGS_TENSOR_NAME)
writer = tf.summary.create_file_writer(LOG_DIR)
with writer.as_default():
    tf.summary.write(tag='projector', tensor=tensor_embeddings,
                     step=STEP, name=EMBEDDINGS_TENSOR_NAME)

It would be great, if this was simplified in new versions of TF. Something like this would be cool:

# WARNING this is purely fictional code :)
writer = tf.summary.create_file_writer(LOG_DIR)
with writer.as_default():
    tf.summary.projector(tensor=tensor_embeddings, labels=tensor_labels,
                     step=STEP, name='desired name')

I understand Tensorflow & Tensorboard team probably have to deal with more important issues, but I must say the lack of documentation on this matter is disturbing. Even the current documentation is sometimes misleading, e.g. pointing out that tf.train.Saver() is deprecated and tf.train.Checkpoint() should be used (with no example). But I just could not make it work. @asitplus-pteufl in the answer at the top shows and example where it works with Checkpoint(), but it works only partially and it requires workarounds. Thanks for that anyways, you have pointed me in a good direction.

Hi @paloha
I'm trying to get variant A working. But does not seem to work - even after I appended :0 (or even without it) in the variable name as that was different. Here is the colab. Could it be due to colab having TF2.2.0?

The metadata file isn't being identified by the projector.

I got it working on TF 2.2 locally by appending "/.ATTRIBUTES/VARIABLE_VALUE" to embedding.tensor_name, see https://github.com/tensorflow/tensorboard/blob/2.2.1/tensorboard/plugins/projector/projector_demo.py#L104

Hi @falaktheoptimist,
thanks for reaching out. I have tried running your colab and the projector works. You just need to run the %tensorboard --logdir tmp cell twice. First time it did not load. Second time it loaded. I have no idea why. But at least it works.

Thanks @paloha @matiaslindgren. That worked. Somehow the previously running tensorboard also was causing issues for me. Thank you so much for all the help!
@matiaslindgren : In the final version, I didn't need to add the attributes or variable_value part. It worked directly in the form @paloha suggested above.

Hi @paloha, could you please show a complete code on how to use TensorBoard to visualize image embedding of trained Keras CNN model. I really can't find anything useful on the internet.

Can we have a clean API to add 2 variables to a file? 4 years of tensorflow and this is still an issue.
The best way to write a simple embedding and use the projector is to download torch and use their embedding API
Stackoverflow answers advice the same

import numpy as np
import tensorflow as tf
import tensorboard as tb
tf.io.gfile = tb.compat.tensorflow_stub.io.gfile
from torch.utils.tensorboard import SummaryWriter

vectors = np.array([[0,0,1], [0,1,0], [1,0,0], [1,1,1]])
metadata = ['001', '010', '100', '111']  # labels
writer = SummaryWriter()
writer.add_embedding(vectors, metadata)
writer.close()
%load_ext tensorboard
%tensorboard --logdir=runs

Tensorboard should have their own API to write simple text files and protocol buffers, something like tensorboard.SummaryWriter seems reasonable

@7khalil sorry, but I do not have a presentable example and I do not have time to make it presentable right now. But with a custom callback in Keras you are sure to make it work. The only problem is I can not access validation or test data in that callback so I load it separately in that callback and use it if necessary. It is a slow down of course but in my latest project the amount of data was not much of a problem.

@Mistobaan, yes that would be lovely.

Thanks @paloha, I figured it out, by downgrade Tensflow version to 1.x.

Was this page helpful?
0 / 5 - 0 ratings