Hi there,
I'm trying to train a CycleGAN model on my universities machine learning cluster.
To run your code I'm using the docker container you provided. When running I get an error stating module "abc" was not found. I guess the docker container is running python 2.7?
I tried using the torch0.3.1 branch with which I was able to successfully train the model but then ran into problems while testing (No module named torch) as I want to do that on my local machine.
I have no experience with docker so I don't know how to create a container that is able to run the master branch. Can you help me with that?
If possible I would like to avoid using the torch0.3.1 branch as I would have to change the environment on my local machine for testing..
I got it to work.
Used the predefined Dockerfile to build a Docker container.
Then converted that container to a singularity sandbox directory (as the cluster I'm training the CycleGAN on uses singularity containers).
Then I ran the training code inside that container and whenever a module was missing or an error occured I installed/fixed that by shelling into the sandbox.
When the training started successfully I built an immutable image from the sandbox and uploaded it to the cluster filesystem.
@taesungp Could you have a look?
When I also try to train using the docker image, I have the same problem as @Peetee06 .
He is right and I solved this problem to install python3 in the container.
Fix: a problem occurs like from scipy.misc import imresize if I install with requirements.txt, I reinstall scipy from 1.3.0 to 1.1.0.
I'm not sure, but I found something missing in the Dockerfile.
First of all, Miniconda-latest-Linux-x86_64.sh should be replace to Miniconda3-latest-Linux-x86_64.sh based on the Miniconda installer archive
Second, missed a RUN conda env create -f environment.yml in the Dockerfile and do more things, or just do it if you download Miniconda3.
I was having issues with your Dockerfile which seems to specify minconda 2 (ie python 2) while the rest of your code uses python 3. I was having issues so I made a Dockerfile using miniconda3 (and CUDA 10.1). Seems like this issue was never fixed but the ticket was closed?
FROM nvidia/cuda:10.1-base
RUN apt update && apt install -y wget unzip curl bzip2 git
RUN curl -LO http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
RUN bash Miniconda3-latest-Linux-x86_64.sh -p /miniconda -b
RUN rm Miniconda3-latest-Linux-x86_64.sh
ENV PATH=/miniconda/bin:${PATH}
RUN conda update -y conda
RUN conda install -y pytorch torchvision -c pytorch
RUN mkdir /workspace/ && cd /workspace/ && git clone https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix.git && cd pytorch-CycleGAN-and-pix2pix && pip install -r requirements.txt
WORKDIR /workspace
Thank you for the suggestion! I updated the Dockerfile following @arbrog 's suggestion.
Most helpful comment
Thank you for the suggestion! I updated the Dockerfile following @arbrog 's suggestion.