Models: vision >> classifier_trainer.py >> dataset

Created on 5 Aug 2020  路  3Comments  路  Source: tensorflow/models

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am using the latest TensorFlow Model Garden release2.3.0 and TensorFlow 2.3.0
I am reporting the issue to the correct repository. (Model Garden official or research directory
I checked to make sure that this issue has not been filed already.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/blob/r2.3.0/official/vision/image_classification/classifier_trainer.py

2. Describe the bug

When I run classifier_trainer.py to training resnet50, it always stopping at:

dataset_factory >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>tfds.builder()
2020-08-05 11:52:32.953550: W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "Not found: Could not locate the credentials file.". Retrieving token from GCE failed with "Failed precondition: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'".
2020-08-05 11:53:48.123025: E tensorflow/core/platform/cloud/curl_http_request.cc:611] The transmission  of request 0x55c74a5687f0 (URI: https://www.googleapis.com/storage/v1/b/tfds-data/o/dataset_info%2Fimagenet2012%2F5.0.0?fields=size%2Cgeneration%2Cupdated) has been stuck at 0 of 0 bytes for 61 seconds and will be aborted. CURL timing information: lookup time: 0.010172 (No error), connect time: 0.066656 (No error), pre-transfer time: 0 (No error), start-transfer time: 0 (No error)

and I find this url is 404 :
https://www.googleapis.com/storage/v1/b/tfds-data/o/dataset_info%2Fimagenet2012%2F5.0.0?fields=size%2Cgeneration%2Cupdated

So tfds.builder() will not work (in dataset_factory.py )

3. Steps to reproduce

training script:

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
MODEL_DIR=../output
DATA_DIR=/datasets/ImageNet/tfrecord

python3 classifier_trainer.py \
  --mode=train_and_eval \
  --model_type=resnet \
  --dataset=imagenet \
  --model_dir=$MODEL_DIR \
  --data_dir=$DATA_DIR \
  --config_file=configs/examples/resnet/imagenet/gpu.yaml \
  --params_override='runtime.num_gpus=8'

4. Expected behavior

A clear and concise description of what you expected to happen.

5. Additional context

Include any logs that would be helpful to diagnose the problem.

6. System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • Mobile device name if the issue happens on a mobile device:
  • TensorFlow installed from pip install tensorflow):
  • TensorFlow version (use command below): 2.3.0
  • Python version: 3.7.7
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source): gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
  • CUDA/cuDNN version: 10.2
  • GPU model and memory: 8脳 v100(16GB)
official bug

Most helpful comment

This problom has been solved, solution is :specify 3.0.0 version of tensorflow-datasets in official/requeirements.txt, theThe latest version of the api does not match
pip uninstall tensorflow-datasets and pip install tensorflow-datasets==3.0.0

All 3 comments

Hi, to clarify, what are the contents within /dataset/Imagenet/tfrecord? If the contents are not TFDS processed TFRecords, could you modify your run command to be as follows?

python3 classifier_trainer.py \
  --mode=train_and_eval \
  --model_type=resnet \
  --dataset=imagenet \
  --model_dir=$MODEL_DIR \
  --data_dir=$DATA_DIR \
  --config_file=configs/examples/resnet/imagenet/gpu.yaml \
  --params_override='runtime.num_gpus=8,train_dataset.builder=records,validation_dataset.builder=records'

Thanks for reply,My dataset type is tfrecord,and I've set builder:records/synthetic for both train and validation in gpu.yaml.

The problem i encountered is in dataset_factory.py(line 331),before the dataset builder really start handle the tfrecords,it will get dataset infos from network(line 331):

self.builder_info = tfds.builder(self.config.name).info

But the target url is missing,so it will be stuck here.
2

This problom has been solved, solution is :specify 3.0.0 version of tensorflow-datasets in official/requeirements.txt, theThe latest version of the api does not match
pip uninstall tensorflow-datasets and pip install tensorflow-datasets==3.0.0

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dsindex picture dsindex  路  3Comments

frankkloster picture frankkloster  路  3Comments

Mostafaghelich picture Mostafaghelich  路  3Comments

mbenami picture mbenami  路  3Comments

trungdn picture trungdn  路  3Comments