Diagnostics output
``````
--- check: autoidentify
INFO: diagnose_tensorboard.py version a511de7ece215f0cfb622f2672563beee93515a9
--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=7, micro=7, releaselevel='final', serial=0)
INFO: os.name: posix
INFO: os.uname(): posix.uname_result(sysname='Darwin', nodename='YYY.XXX.com', release='18.7.0', version='Darwin Kernel Version 18.7.0: Mon Feb 10 21:08:45 PST 2020; root:xnu-4903.278.28~1/RELEASE_X86_64', machine='x86_64')
INFO: sys.getwindowsversion(): N/A
--- check: package_management
INFO: has conda-meta: False
INFO: $VIRTUAL_ENV: '/Users/yegor/.local/share/virtualenvs/test-XHT2gPb6'
--- check: installed_packages
INFO: installed: tensorboard==2.2.2
INFO: installed: tensorflow==2.2.0
INFO: installed: tensorflow-estimator==2.2.0
--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '2.2.2'
--- check: tensorflow_python_version
INFO: tensorflow.__version__: '2.2.0'
INFO: tensorflow.__git_version__: 'v2.2.0-rc4-8-g2b96f3662b'
--- check: tensorboard_binary_path
INFO: which tensorboard: b'/Users/yegor/.local/share/virtualenvs/test-XHT2gPb6/bin/tensorboard\n'
--- check: addrinfos
socket.has_ipv6 = True
socket.AF_UNSPEC =
socket.SOCK_STREAM =
socket.AI_ADDRCONFIG =
socket.AI_PASSIVE =
Loopback flags:
Loopback infos: [(
Wildcard flags:
Wildcard infos: [(
--- check: readable_fqdn
INFO: socket.getfqdn(): 'edge-bw-101.e-lax3.amazon.com'
--- check: stat_tensorboardinfo
INFO: directory: /var/folders/z4/sq18l1dx31s3msst1f8t37jnwl6w52/T/.tensorboard-info
INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=28189431, st_dev=16777220, st_nlink=2, st_uid=2033414306, st_gid=1896053708, st_size=64, st_atime=1591690635, st_mtime=1591696619, st_ctime=1591696619)
INFO: mode: 0o40777
--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['/Users/yegor/.local/share/virtualenvs/test-XHT2gPb6/lib/python3.7/site-packages']; bad_roots (0): []
--- check: full_pip_freeze
INFO: pip freeze --all:
absl-py==0.9.0
astunparse==1.6.3
cachetools==4.1.0
certifi==2020.4.5.2
chardet==3.0.4
gast==0.3.3
google-auth==1.16.1
google-auth-oauthlib==0.4.1
google-pasta==0.2.0
grpcio==1.29.0
gviz-api==1.9.0
h5py==2.10.0
idna==2.9
importlib-metadata==1.6.1
Keras-Preprocessing==1.1.2
Markdown==3.2.2
numpy==1.18.5
oauthlib==3.1.0
opt-einsum==3.2.1
pip==20.1
protobuf==3.12.2
pyasn1==0.4.8
pyasn1-modules==0.2.8
requests==2.23.0
requests-oauthlib==1.3.0
rsa==4.0
scipy==1.4.1
setuptools==46.1.3
six==1.15.0
tensorboard==2.2.2
tensorboard-plugin-profile==2.2.0
tensorboard-plugin-wit==1.6.0.post3
tensorflow==2.2.0
tensorflow-estimator==2.2.0
termcolor==1.1.0
urllib3==1.25.9
Werkzeug==1.0.1
wheel==0.34.2
wrapt==1.12.1
zipp==3.1.0
``````
The latest TB version generates a lot of log entries like this:
WARNING:tensorflow:FailedPreconditionError: AWS Credentials have not been set properly. Unable to access the specified S3 location
W0609 11:56:56.412047 123145556799488 checkpoint_management.py:295] FailedPreconditionError: AWS Credentials have not been set properly. Unable to access the specified S3 location
WARNING:tensorflow:s3://sagemaker-eu-west-1-XXX/logs/job_name/validation/../checkpoint: Checkpoint ignored
I do not write any checkpoints in my training script and the file above doesn't exist. Everything I think I want to see in TB is there, so I'm pretty sure AWS Credentials are set properly.
1) Am I missing anything any TB feature by not writing checkpoints? I don't intend to use Projector plugin.
2) How is TB coming up with this checkpoint path? I haven't specified it anywhere in the configuration. Or is it referenced somewhere in logs?
3) Is there a recommended way to structure model artifacts, logs and checkpoints that TB relies on?
4) Is it possible to make this error messages more descriptive?
Hi @yegortokmakov , this error seems to be stemming from within TensorFlow. Can you share the command you used to run TensorBoard? Thanks :)
hi @bileschi thanks for the reply!
This is the command: AWS_REGION={aws_region} tensorboard --logdir s3://{tensorflow_logs_path}
The logs do come from Tensorflow, but I could find references to "checkpoitns" only in projector plugin.
P.S. All code is available here: https://github.com/awslabs/amazon-sagemaker-examples/pull/1267
Thanks, will investigate.
Most helpful comment
Thanks, will investigate.