Pytube: 404 on adaptive downloads

Created on 12 Aug 2020  路  23Comments  路  Source: pytube/pytube

I'm getting a 404 from youtube when I attempt to download adaptive streams. Progressive is fine, but those aren't the highest quality videos. Is there a work around for this one?

from pytube import YouTube
yt_video = YouTube("https://www.youtube.com/watch?v=xFSVoVOvaew")
yt_video.streams.filter(adaptive=True).order_by('resolution').first().download()

Returns

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kaiser/tmp/pytube/pytube/streams.py", line 245, in download
    bytes_remaining = self.filesize
  File "/Users/kaiser/tmp/pytube/pytube/streams.py", line 156, in filesize
    self._filesize = request.filesize(self.url)
  File "/Users/kaiser/tmp/pytube/pytube/request.py", line 86, in filesize
    return int(head(url)["content-length"])
  File "/Users/kaiser/tmp/pytube/pytube/request.py", line 98, in head
    response_headers = _execute_request(url, method="HEAD").info()
  File "/Users/kaiser/tmp/pytube/pytube/request.py", line 27, in _execute_request
    return urlopen(request)  # nosec
  File "/Users/kaiser/.pyenv/versions/3.8.2/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/Users/kaiser/.pyenv/versions/3.8.2/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/Users/kaiser/.pyenv/versions/3.8.2/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/Users/kaiser/.pyenv/versions/3.8.2/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/Users/kaiser/.pyenv/versions/3.8.2/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/Users/kaiser/.pyenv/versions/3.8.2/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

All 23 comments

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

This is still an issue, it should not be marked as stale.

+1

@tfdahlin Could you assist with this when you get a minute?

I'll look into it

I'm running into the same issue. Do you have any idea what's going on?

@xielongze still trying to figure out what the issue is. I'm not as familiar with how the stream object code works, so it's taking me a little while to chew through it and figure out the problem

I've identified the problem, and I'm working on a fix.

It looks like YouTube has added sequence numbers as part of their adaptive downloads, and actually require you to send requests to the same URL repeatedly with different parameters in order to get the data. Right now, the code assumes that a single URL is used, and you can simple specify the content-range that you want to download.

It's going to take me a little while to write proper sequence downloading for this given my schedule this week, so please bear with me for a little while while I work this out.

I believe that this branch in my fork fixes this issue. I will need to spend some time writing unit tests for it before I make a PR and merge it, but I'd appreciate some feedback on if it works for others or not.

@tfdahlin I tested your branch with some videos that were resulting in Error 404 and it worked very well

Glad to hear it @GustavoStahl. I'll work on getting the unit tests written for my branch, then I'll make a PR, hopefully by end of week.

@tfdahlin Actually, now the same videos i tested before are being saved with duration 0 seconds

@GustavoStahl that's unusual. What videos are you having problems with so I can try and track down the issue?

Additionally, can you give me the code for how you're downloading them so I can try to replicate the issue?

@tfdahlin Sure, here it goes

Code

from pytube import YouTube
YouTube('https://www.youtube.com/watch?v=rUWxSEwctFU').streams.filter(adaptive=True).order_by('resolution').last().download() # Works well
YouTube('https://www.youtube.com/watch?v=60hUgAjmHG4').streams.filter(adaptive=True).order_by('resolution').last().download() # Wrong metadata
YouTube('https://www.youtube.com/watch?v=aZeIzUvt3kg').streams.filter(adaptive=True).order_by('resolution').last().download() # Wrong metadata

Metadata for each video
combine_images (6)

@GustavoStahl Cool, thanks, I'll look into it. I know that the metadata is incorrect, but are you able to play the videos without the duration, for example with VLC player? I'm wondering if it's just an issue where the metadata needs to be corrected, or if the download is failing

@tfdahlin I can watch the videos normally, the only problem is the metadata

Ok, I'll see if I can get that metadata issue fixed

@GustavoStahl spent a few hours digging around, and it looks like the way that these files are sent, this metadata simply doesn't exist in the file that gets downloaded. Because of the way that YouTube streams these files, it simply doesn't include the duration as part of the metadata that they transmit, so I don't think there's an easy way for me to fix this problem. If you have ffmpeg installed, it looks like you can fix the duration issue by running ffmpeg -i <input_file_name> -c copy <output_file_name>, which will create a copy of the file with the duration metadata patched in, but I'm not sure if I'm comfortable trying to patch metadata into the filestream. I'll look into it a little bit more, but I don't want to make promises.

@tfdahlin That's odd, because when i first tried your fork everything went well, suddenly ~3 hours later this metadata corruption happens. As for the patch, no worries, i'll try this ffmpeg approach.

@GustavoStahl it's entirely possible that youtube is making some changes to how it creates the partitioned files, and that they aren't including this metadata any more. While I was looking into this problem, I found a few cases where people were losing the metadata due to the way that the files were getting partitioned and streamed on-the-fly, and in YouTube's case, it might not make sense to even fix that problem

Working on those unit tests now, hopefully going to have a PR ready late tonight or tomorrow. Still don't think I can fix the metadata issue for now, but I'll make a new issue for it once this PR get's merged.

made a PR with unit tests for this here: https://github.com/nficano/pytube/pull/799

Was this page helpful?
0 / 5 - 0 ratings

Related issues

NeverAskWhy picture NeverAskWhy  路  62Comments

haroldfry picture haroldfry  路  58Comments

stdedos picture stdedos  路  24Comments

RONNCC picture RONNCC  路  20Comments

RONNCC picture RONNCC  路  29Comments