Tensorboard: Verbose Checkpoint ignored warning

Created on 9 Jun 2020  路  3Comments  路  Source: tensorflow/tensorboard

Environment information


Diagnostics output

``````
--- check: autoidentify
INFO: diagnose_tensorboard.py version a511de7ece215f0cfb622f2672563beee93515a9

--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=7, micro=7, releaselevel='final', serial=0)
INFO: os.name: posix
INFO: os.uname(): posix.uname_result(sysname='Darwin', nodename='YYY.XXX.com', release='18.7.0', version='Darwin Kernel Version 18.7.0: Mon Feb 10 21:08:45 PST 2020; root:xnu-4903.278.28~1/RELEASE_X86_64', machine='x86_64')
INFO: sys.getwindowsversion(): N/A

--- check: package_management
INFO: has conda-meta: False
INFO: $VIRTUAL_ENV: '/Users/yegor/.local/share/virtualenvs/test-XHT2gPb6'

--- check: installed_packages
INFO: installed: tensorboard==2.2.2
INFO: installed: tensorflow==2.2.0
INFO: installed: tensorflow-estimator==2.2.0

--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '2.2.2'

--- check: tensorflow_python_version
INFO: tensorflow.__version__: '2.2.0'
INFO: tensorflow.__git_version__: 'v2.2.0-rc4-8-g2b96f3662b'

--- check: tensorboard_binary_path
INFO: which tensorboard: b'/Users/yegor/.local/share/virtualenvs/test-XHT2gPb6/bin/tensorboard\n'

--- check: addrinfos
socket.has_ipv6 = True
socket.AF_UNSPEC =
socket.SOCK_STREAM =
socket.AI_ADDRCONFIG =
socket.AI_PASSIVE =
Loopback flags:
Loopback infos: [(, , 6, '', ('127.0.0.1', 0)), (, , 6, '', ('::1', 0, 0, 0))]
Wildcard flags:
Wildcard infos: [(, , 6, '', ('::', 0, 0, 0)), (, , 6, '', ('0.0.0.0', 0))]

--- check: readable_fqdn
INFO: socket.getfqdn(): 'edge-bw-101.e-lax3.amazon.com'

--- check: stat_tensorboardinfo
INFO: directory: /var/folders/z4/sq18l1dx31s3msst1f8t37jnwl6w52/T/.tensorboard-info
INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=28189431, st_dev=16777220, st_nlink=2, st_uid=2033414306, st_gid=1896053708, st_size=64, st_atime=1591690635, st_mtime=1591696619, st_ctime=1591696619)
INFO: mode: 0o40777

--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['/Users/yegor/.local/share/virtualenvs/test-XHT2gPb6/lib/python3.7/site-packages']; bad_roots (0): []

--- check: full_pip_freeze
INFO: pip freeze --all:
absl-py==0.9.0
astunparse==1.6.3
cachetools==4.1.0
certifi==2020.4.5.2
chardet==3.0.4
gast==0.3.3
google-auth==1.16.1
google-auth-oauthlib==0.4.1
google-pasta==0.2.0
grpcio==1.29.0
gviz-api==1.9.0
h5py==2.10.0
idna==2.9
importlib-metadata==1.6.1
Keras-Preprocessing==1.1.2
Markdown==3.2.2
numpy==1.18.5
oauthlib==3.1.0
opt-einsum==3.2.1
pip==20.1
protobuf==3.12.2
pyasn1==0.4.8
pyasn1-modules==0.2.8
requests==2.23.0
requests-oauthlib==1.3.0
rsa==4.0
scipy==1.4.1
setuptools==46.1.3
six==1.15.0
tensorboard==2.2.2
tensorboard-plugin-profile==2.2.0
tensorboard-plugin-wit==1.6.0.post3
tensorflow==2.2.0
tensorflow-estimator==2.2.0
termcolor==1.1.0
urllib3==1.25.9
Werkzeug==1.0.1
wheel==0.34.2
wrapt==1.12.1
zipp==3.1.0

``````

Issue description

The latest TB version generates a lot of log entries like this:

WARNING:tensorflow:FailedPreconditionError: AWS Credentials have not been set properly. Unable to access the specified S3 location
W0609 11:56:56.412047 123145556799488 checkpoint_management.py:295] FailedPreconditionError: AWS Credentials have not been set properly. Unable to access the specified S3 location
WARNING:tensorflow:s3://sagemaker-eu-west-1-XXX/logs/job_name/validation/../checkpoint: Checkpoint ignored

I do not write any checkpoints in my training script and the file above doesn't exist. Everything I think I want to see in TB is there, so I'm pretty sure AWS Credentials are set properly.

1) Am I missing anything any TB feature by not writing checkpoints? I don't intend to use Projector plugin.
2) How is TB coming up with this checkpoint path? I haven't specified it anywhere in the configuration. Or is it referenced somewhere in logs?
3) Is there a recommended way to structure model artifacts, logs and checkpoints that TB relies on?
4) Is it possible to make this error messages more descriptive?

backend awaiting tensorflower bug

Most helpful comment

Thanks, will investigate.

All 3 comments

Hi @yegortokmakov , this error seems to be stemming from within TensorFlow. Can you share the command you used to run TensorBoard? Thanks :)

hi @bileschi thanks for the reply!

This is the command: AWS_REGION={aws_region} tensorboard --logdir s3://{tensorflow_logs_path}

The logs do come from Tensorflow, but I could find references to "checkpoitns" only in projector plugin.

P.S. All code is available here: https://github.com/awslabs/amazon-sagemaker-examples/pull/1267

Thanks, will investigate.

Was this page helpful?
0 / 5 - 0 ratings