Please answer the following questions for yourself before submitting an issue.
I am using the latest TensorFlow Model Garden release2.3.0 and TensorFlow 2.3.0
I am reporting the issue to the correct repository. (Model Garden official or research directory
I checked to make sure that this issue has not been filed already.
When I run classifier_trainer.py to training resnet50, it always stopping at:
dataset_factory >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>tfds.builder()
2020-08-05 11:52:32.953550: W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "Not found: Could not locate the credentials file.". Retrieving token from GCE failed with "Failed precondition: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'".
2020-08-05 11:53:48.123025: E tensorflow/core/platform/cloud/curl_http_request.cc:611] The transmission of request 0x55c74a5687f0 (URI: https://www.googleapis.com/storage/v1/b/tfds-data/o/dataset_info%2Fimagenet2012%2F5.0.0?fields=size%2Cgeneration%2Cupdated) has been stuck at 0 of 0 bytes for 61 seconds and will be aborted. CURL timing information: lookup time: 0.010172 (No error), connect time: 0.066656 (No error), pre-transfer time: 0 (No error), start-transfer time: 0 (No error)
and I find this url is 404 :
https://www.googleapis.com/storage/v1/b/tfds-data/o/dataset_info%2Fimagenet2012%2F5.0.0?fields=size%2Cgeneration%2Cupdated
So tfds.builder() will not work (in dataset_factory.py )
training script:
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
MODEL_DIR=../output
DATA_DIR=/datasets/ImageNet/tfrecord
python3 classifier_trainer.py \
--mode=train_and_eval \
--model_type=resnet \
--dataset=imagenet \
--model_dir=$MODEL_DIR \
--data_dir=$DATA_DIR \
--config_file=configs/examples/resnet/imagenet/gpu.yaml \
--params_override='runtime.num_gpus=8'
A clear and concise description of what you expected to happen.
Include any logs that would be helpful to diagnose the problem.
Hi, to clarify, what are the contents within /dataset/Imagenet/tfrecord? If the contents are not TFDS processed TFRecords, could you modify your run command to be as follows?
python3 classifier_trainer.py \
--mode=train_and_eval \
--model_type=resnet \
--dataset=imagenet \
--model_dir=$MODEL_DIR \
--data_dir=$DATA_DIR \
--config_file=configs/examples/resnet/imagenet/gpu.yaml \
--params_override='runtime.num_gpus=8,train_dataset.builder=records,validation_dataset.builder=records'
Thanks for reply,My dataset type is tfrecord,and I've set builder:records/synthetic for both train and validation in gpu.yaml.
The problem i encountered is in dataset_factory.py(line 331),before the dataset builder really start handle the tfrecords,it will get dataset infos from network(line 331):
self.builder_info = tfds.builder(self.config.name).info
But the target url is missing,so it will be stuck here.

This problom has been solved, solution is :specify 3.0.0 version of tensorflow-datasets in official/requeirements.txt, theThe latest version of the api does not match
pip uninstall tensorflow-datasets and pip install tensorflow-datasets==3.0.0
Most helpful comment
This problom has been solved, solution is :specify 3.0.0 version of tensorflow-datasets in official/requeirements.txt, theThe latest version of the api does not match
pip uninstall tensorflow-datasetsandpip install tensorflow-datasets==3.0.0