Addons: Stop using tensorflow-testing docker images and start using hub.docker.com/tensorflow images

Created on 20 Feb 2020 · 13Comments · Source: tensorflow/addons

The master branch is broken because the docker image used to test on gpus is the one used to test the tensorflow source code itself. This can be considered private API.

Instead, to build our manylinux2010 wheels and test our code, we should use the official way recommended by tensorflow (the same way external users do it).

Here is the official guide: https://github.com/tensorflow/custom-op
The docker images we should use are from the official dockerhub repo:

tensorflow/tensorflow:2.1.0-custom-op-gpu-ubuntu16  # gpu
tensorflow/tensorflow:2.1.0-custom-op-ubuntu16          # cpu

Those images are "manylinux2010 compatible" sort of, since it's ubuntu.

By doing that, our testing environment will become stable again, and we shouldn't suffer sudden build breaks.

build

Source

gabrieldemarmiesse

Most helpful comment

I've tested the image locally and it's not a network error :(

Looks like it's because python3.8 was added to the container.

seanpmorgan on 20 Feb 2020

👍2

All 13 comments

Thanks for the issue! I agree using a nosla docker image is not the way we should be doing this.

So couple of things:
1) I'm not convinced of the reason GPU builds are breaking today. Almost looks like a pip networking issue. Not sure why this has just started today. The error:

ERROR: Could not find a version that satisfies the requirement tensorflow==2.1.0 (from -r build-requirements.txt (line 1)) (from versions: none)
ERROR: No matching distribution found for tensorflow==2.1.0 (from -r build-requirements.txt (line 1))

2) We used to use tensorflow/tensorflow:custom-op-ubuntu16 but the images are rarely updated (something we can probably fix with more communication) and we had an issue when TF incremented their cuDNN version. We can build from a base image, but we can't pragmatically pull in a new cuDNN since this requires an NVIDIA account.

3) The new tensorflow/tensorflow:2.1.0-custom-op-gpu-ubuntu16 looks promising and wasn't available before. This should align the correct cuDNN in the container, but I wonder what the release process of this is. Come 2.2 release candidates how long will it take for us to get a new docker image? I suppose we can use the old one provided CUDA and cuDNN match. It also has the needed devtoolset-7 & devtoolset-8 for building manylinux2010 (/dt7/ in the contianer)

seanpmorgan on 20 Feb 2020

@av8ramit Do you happen to know why our GPU tests are failing for the above error? I'm not sure why (as of today) it's unable to locate a TF 2.1.0 from pypi

Log:
https://source.cloud.google.com/results/invocations/ceb33b10-a414-4713-9920-bb9564319e37/targets/tensorflow_addons%2Fubuntu%2Fgpu%2Fpy3%2Fpresubmit/log

seanpmorgan on 20 Feb 2020

I've tested the image locally and it's not a network error :(

gabrieldemarmiesse on 20 Feb 2020

👍1

I've tested the image locally and it's not a network error :(

Looks like it's because python3.8 was added to the container.

seanpmorgan on 20 Feb 2020

👍2

Nice find @seanmorgan! I still think we should switch images before they do more drastic changes, if they change the cuda version, the build will break again.

gabrieldemarmiesse on 20 Feb 2020

👍1

Nice find @seanmorgan! I still think we should switch images before they do more drastic changes, if they change the cuda version, the build will break again.

Yeah #1117 is hanging on by a thread since python3 could change symlink to py38 and then our configure script will break as well as the docker build.

seanpmorgan on 20 Feb 2020

Heads up @yongtang . I know TF IO uses the same image as us for builds and not entirely sure your pipeline but if you're using pip3 -> py38 ; python3 -> python3.6

seanpmorgan on 20 Feb 2020

We would need a simple first pull request as proof of concept. In https://github.com/tensorflow/addons/blob/master/tools/docker/gpu_tests.Dockerfile we should use tensorflow/tensorflow:2.1.0-custom-op-gpu-ubuntu16 as the base image (after the FROM).

@failure-to-thrive would that be something you'd be interested to work on since you seem familiar with the build system?

gabrieldemarmiesse on 20 Feb 2020

I'll take a look and fix the image. In the meantime is there any chance you can specify pip3.6. We did recently add python3.8 to the image.

av8ramit on 20 Feb 2020

I'll take a look and fix the image. In the meantime is there any chance you can specify pip3.6. We did recently add python3.8 to the image.

Yeah we're good on our side for the moment. Going forward I think we'll move away from the nosla tensorflow testing image so you won't need to worry about us

seanpmorgan on 20 Feb 2020

Yes I highly recommend that since we offer no support for that. That's our own internal image.

av8ramit on 20 Feb 2020

@failure-to-thrive would that be something you'd be interested to work on since you seem familiar with the build system?

Sorry, no. I'm mostly C++ and Python. Other things are casual.

failure-to-thrive on 21 Feb 2020

@failure-to-thrive no worries, I'll do it :)

gabrieldemarmiesse on 21 Feb 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Windows nightly is broken

WindQAQ · 4Comments

Clean up tutorials

seanpmorgan · 3Comments

Request for example: Weight Decay Optimizers / Super Convergence

seanpmorgan · 4Comments

Merging tfa.callbacks.tqdm_progress_bar with tqdm.keras

shun-lin · 4Comments

import tensorflow_addons gives AttributeError: module 'tensorflow_core._api.v2.random' has no attribute 'Generator'

SoufianeDataFan · 4Comments