I'm getting a 404 from youtube when I attempt to download adaptive streams. Progressive is fine, but those aren't the highest quality videos. Is there a work around for this one?
from pytube import YouTube
yt_video = YouTube("https://www.youtube.com/watch?v=xFSVoVOvaew")
yt_video.streams.filter(adaptive=True).order_by('resolution').first().download()
Returns
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/kaiser/tmp/pytube/pytube/streams.py", line 245, in download
bytes_remaining = self.filesize
File "/Users/kaiser/tmp/pytube/pytube/streams.py", line 156, in filesize
self._filesize = request.filesize(self.url)
File "/Users/kaiser/tmp/pytube/pytube/request.py", line 86, in filesize
return int(head(url)["content-length"])
File "/Users/kaiser/tmp/pytube/pytube/request.py", line 98, in head
response_headers = _execute_request(url, method="HEAD").info()
File "/Users/kaiser/tmp/pytube/pytube/request.py", line 27, in _execute_request
return urlopen(request) # nosec
File "/Users/kaiser/.pyenv/versions/3.8.2/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/Users/kaiser/.pyenv/versions/3.8.2/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/Users/kaiser/.pyenv/versions/3.8.2/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/Users/kaiser/.pyenv/versions/3.8.2/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/Users/kaiser/.pyenv/versions/3.8.2/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/Users/kaiser/.pyenv/versions/3.8.2/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This is still an issue, it should not be marked as stale.
+1
@tfdahlin Could you assist with this when you get a minute?
I'll look into it
I'm running into the same issue. Do you have any idea what's going on?
@xielongze still trying to figure out what the issue is. I'm not as familiar with how the stream object code works, so it's taking me a little while to chew through it and figure out the problem
I've identified the problem, and I'm working on a fix.
It looks like YouTube has added sequence numbers as part of their adaptive downloads, and actually require you to send requests to the same URL repeatedly with different parameters in order to get the data. Right now, the code assumes that a single URL is used, and you can simple specify the content-range that you want to download.
It's going to take me a little while to write proper sequence downloading for this given my schedule this week, so please bear with me for a little while while I work this out.
I believe that this branch in my fork fixes this issue. I will need to spend some time writing unit tests for it before I make a PR and merge it, but I'd appreciate some feedback on if it works for others or not.
@tfdahlin I tested your branch with some videos that were resulting in Error 404 and it worked very well
Glad to hear it @GustavoStahl. I'll work on getting the unit tests written for my branch, then I'll make a PR, hopefully by end of week.
@tfdahlin Actually, now the same videos i tested before are being saved with duration 0 seconds
@GustavoStahl that's unusual. What videos are you having problems with so I can try and track down the issue?
Additionally, can you give me the code for how you're downloading them so I can try to replicate the issue?
@tfdahlin Sure, here it goes
Code
from pytube import YouTube
YouTube('https://www.youtube.com/watch?v=rUWxSEwctFU').streams.filter(adaptive=True).order_by('resolution').last().download() # Works well
YouTube('https://www.youtube.com/watch?v=60hUgAjmHG4').streams.filter(adaptive=True).order_by('resolution').last().download() # Wrong metadata
YouTube('https://www.youtube.com/watch?v=aZeIzUvt3kg').streams.filter(adaptive=True).order_by('resolution').last().download() # Wrong metadata
Metadata for each video

@GustavoStahl Cool, thanks, I'll look into it. I know that the metadata is incorrect, but are you able to play the videos without the duration, for example with VLC player? I'm wondering if it's just an issue where the metadata needs to be corrected, or if the download is failing
@tfdahlin I can watch the videos normally, the only problem is the metadata
Ok, I'll see if I can get that metadata issue fixed
@GustavoStahl spent a few hours digging around, and it looks like the way that these files are sent, this metadata simply doesn't exist in the file that gets downloaded. Because of the way that YouTube streams these files, it simply doesn't include the duration as part of the metadata that they transmit, so I don't think there's an easy way for me to fix this problem. If you have ffmpeg installed, it looks like you can fix the duration issue by running ffmpeg -i <input_file_name> -c copy <output_file_name>, which will create a copy of the file with the duration metadata patched in, but I'm not sure if I'm comfortable trying to patch metadata into the filestream. I'll look into it a little bit more, but I don't want to make promises.
@tfdahlin That's odd, because when i first tried your fork everything went well, suddenly ~3 hours later this metadata corruption happens. As for the patch, no worries, i'll try this ffmpeg approach.
@GustavoStahl it's entirely possible that youtube is making some changes to how it creates the partitioned files, and that they aren't including this metadata any more. While I was looking into this problem, I found a few cases where people were losing the metadata due to the way that the files were getting partitioned and streamed on-the-fly, and in YouTube's case, it might not make sense to even fix that problem
Working on those unit tests now, hopefully going to have a PR ready late tonight or tomorrow. Still don't think I can fix the metadata issue for now, but I'll make a new issue for it once this PR get's merged.
made a PR with unit tests for this here: https://github.com/nficano/pytube/pull/799