I'm getting a 403 error when I try to download MNIST dataset with torchvision 0.4.2.
../.local/lib/python3.6/site-packages/torchvision/datasets/mnist.py:68: in __init__
self.download()
../.local/lib/python3.6/site-packages/torchvision/datasets/mnist.py:135: in download
download_and_extract_archive(url, download_root=self.raw_folder, filename=filename)
../.local/lib/python3.6/site-packages/torchvision/datasets/utils.py:248: in download_and_extract_archive
download_url(url, download_root, filename, md5)
../.local/lib/python3.6/site-packages/torchvision/datasets/utils.py:96: in download_url
raise e
../.local/lib/python3.6/site-packages/torchvision/datasets/utils.py:84: in download_url
reporthook=gen_bar_updater()
/usr/local/lib/python3.6/urllib/request.py:248: in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
/usr/local/lib/python3.6/urllib/request.py:223: in urlopen
return opener.open(url, data, timeout)
/usr/local/lib/python3.6/urllib/request.py:532: in open
response = meth(req, response)
/usr/local/lib/python3.6/urllib/request.py:642: in http_response
'http', request, response, code, msg, hdrs)
/usr/local/lib/python3.6/urllib/request.py:570: in error
return self._call_chain(*args)
/usr/local/lib/python3.6/urllib/request.py:504: in _call_chain
result = func(*args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <urllib.request.HTTPDefaultErrorHandler object at 0x7efbf9edaac8>
req = <urllib.request.Request object at 0x7efbf9eda8d0>
fp = <http.client.HTTPResponse object at 0x7efbf9edaf98>, code = 403
msg = 'Forbidden', hdrs = <http.client.HTTPMessage object at 0x7efbf9ea22b0>
def http_error_default(self, req, fp, code, msg, hdrs):
> raise HTTPError(req.full_url, code, msg, hdrs, fp)
E urllib.error.HTTPError: HTTP Error 403: Forbidden
https://app.circleci.com/jobs/github/PyTorchLightning/pytorch-lightning/6877
Thanks for reporting! I can reproduce the issue locally, and downloading from the browser works.
I don't yet know what the root cause is though.
I think we might need to pass header in the download_url function https://github.com/pytorch/vision/blob/c3e2b018517dedcbda18462f5d3e62e1fd913003/torchvision/datasets/utils.py#L59-L100 according to https://stackoverflow.com/questions/13303449/urllib2-httperror-http-error-403-forbidden
cc @cpuhrsch @vincentqb @zhangguanheng66 for awareness
this is because the download links for mnist at https://github.com/pytorch/vision/blob/master/torchvision/datasets/mnist.py#L33-L36 are hosted on yann.lecun.com and that server has moved under CloudFlare protection.
@fmassa we need to maybe mirror and change the URLs to maybe the PyTorch S3 bucket or something
so could we make a hot-fix somehow?
@Borda I haven't tried the current hotfix I mentioned, but I think it might be possible, would you be able to try it and send a PR? Otherwise I'll look into it early next week (I'm working towards ECCV deadline tomorrow)
And I would rather avoid hosting the datasets ourselves, as this would give precedence on us storing the datasets.
Is there any way to have a quick fix without using the master?
I am concerned about the potential changes I have to do in my code for going from the version I am using (1.4.0) and the master.
@eduardo4jesus You can explicitly add headers as stated above, something alike:
opener = urllib.request.URLopener()
opener.addheader('User-Agent', some_user_agent)
opener.retrieve(
url, fpath,
reporthook=gen_bar_updater()
)
(line 81 and onwards in vision/torchvision/datasets/utils.py). Seems to be a quick workaround that works.
@eduardo4jesus You could patch your model script at the top using:
from six.moves import urllib
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)
It will use that user agent for the entire script assuming the opener does not get overwritten somewhere else.
To make it work for python 2 as well:
import urllib
try:
# For python 2
class AppURLopener(urllib.FancyURLopener):
version = "Mozilla/5.0"
urllib._urlopener = AppURLopener()
except AttributeError:
# For python 3
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)
so for python 3 I now use the following snipplet:
from torchvision import datasets
import torchvision.transforms as transforms
import urllib
num_workers = 0
batch_size = 20
basepath = 'some/base/path'
transform = transforms.ToTensor()
def set_header_for(url, filename):
opener = urllib.request.URLopener()
opener.addheader('User-Agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36')
opener.retrieve(
url, f'{basepath}/{filename}')
set_header_for('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', 'train-images-idx3-ubyte.gz')
set_header_for('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', 'train-labels-idx1-ubyte.gz')
set_header_for('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', 't10k-images-idx3-ubyte.gz')
set_header_for('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', 't10k-labels-idx1-ubyte.gz')
train_data = datasets.MNIST(root='data', train=True,
download=True, transform=transform)
test_data = datasets.MNIST(root='data', train=False,
download=False, transform=transform)
You would need to modify the basepath variable of course
I've just got the same problem. Waiting for the answer without changing codes... (ROOKIE ALERT)
I've just got the same problem. Waiting for the answer without changing codes... (ROOKIE ALERT)
Clone this to your working dir:
https://github.com/knamdar/data
The problem ist that Yann LeCun’s side changed hoster if I got it right, and this one checks if the HTTP headers are set.
I currently work around with the following code:
from torchvision import datasets
import torchvision.transforms as transforms
import urllib
num_workers = 0
batch_size = 20
basepath = 'some/base/path'
transform = transforms.ToTensor()
def set_header_for(url, filename):
opener = urllib.request.URLopener()
opener.addheader('User-Agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36')
opener.retrieve(
url, f'{basepath}/{filename}')
set_header_for('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', 'train-images-idx3-ubyte.gz')
set_header_for('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', 'train-labels-idx1-ubyte.gz')
set_header_for('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', 't10k-images-idx3-ubyte.gz')
set_header_for('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', 't10k-labels-idx1-ubyte.gz')
train_data = datasets.MNIST(root='data', train=True,
download=True, transform=transform)
test_data = datasets.MNIST(root='data', train=False,
download=False, transform=transform)
You need to change base path of course
On 05.03.2020, at 05:26, Nikita Makarin notifications@github.com wrote:
I've the same issue when I'm trying to get datasets:
import torch
import torchvision
from torchvision import transforms, datasetstrain = datasets.MNIST("", train=True, download=True,
transform=transforms.Compose([transforms.ToTensor()]))test = datasets.MNIST("", train=False, download=True,
transform=transforms.Compose([transforms.ToTensor()]))
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/pytorch/vision/issues/1938?email_source=notifications&email_token=AAN2AFNSOADTTTO6F3JRBLDRF4SZFA5CNFSM4LBCIY62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEN3VCJQ#issuecomment-595022118, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN2AFI4ZQEJJ2HEPJCBHP3RF4SZFANCNFSM4LBCIY6Q.
from six.moves import urllib
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)
@nvcastet, Thank you so much for the clarification. At that point I misunderstood that I would have to go into Torchvision library and change one of its internal files, which would not ben a smooth move on Colab/Kaggle.
vision/torchvision/datasets/utils.py
This should have been fixed now, there is no need to update torchvision.
All should be working as before, without any change on the user side.
This was fixed on the server hosting the original dataset (thanks @soumith !).
As such, I'm closing this issue but let us know if you still face this issue.
Most helpful comment
@eduardo4jesus You could patch your model script at the top using:
It will use that user agent for the entire script assuming the opener does not get overwritten somewhere else.