Pytube: Only Downloading First 100 Videos in Playlist

Created on 5 Apr 2018 · 3Comments · Source: pytube/pytube

Issue: Using Amazon channel on YT as an example, they have over 5000 videos. However, I'm only able to download 100. It seems the list being constructed is only 100 long.

This illustrates what I've done so far to get URL and come to my conclusion:

Visited this URL: https://www.youtube.com/channel/UCd6MoB9NC6uYN2grvUNT-Zg
Which lead to this URL: https://www.youtube.com/user/AmazonWebServices/videos
This takes you to this URL, which I used in my download script: https://www.youtube.com/watch?v=58PpYacL-VQ&list=UUd6MoB9NC6uYN2grvUNT-Zg

Here's my download script:

#!/usr/local/bin/python

from pytube import Playlist

pl = Playlist("https://www.youtube.com/watch?v=58PpYacL-VQ&list=UUd6MoB9NC6uYN2grvUNT-Zg")
pl.populate_video_urls()
print "List size is %s:" % len(pl.video_urls)

# pl.download_all()

This returns a list of 100 YT video URLs. Hoping you can shed some light into this. I may be able to help fix, given bandwidth.

Thanks!

bug

Source

dreamingbinary

Most helpful comment

hi dreamingbinary,

I'm not a part of this project but i have run into the same issue.
Basically the problem is that what this library does is that it downloads the webpage that has the playlist on it. this page however only contains up to 100 videos. if a playlist is longer than this than the rest of those videos are requested later on by the browser.

at first i checked if it would be possible to spoof the requests. However the requests are incredebly complicated and you'd basically have reverse engineer all of the javascript to figure it out.
So instead i've written a program that uses selenium (a tool that lets you automate the browser) to render the page and get it to send the requests.

Here's the link to the repo:
https://github.com/johnvanderholt/youtube_playlist_downloader
You'll the main.py and drivers.py file. However you'll also need the geckodriver.exe and an installation of firefox.

I could submit a pull request if the maintainers of this project would like me to, However it is quite slow and expensive since you have to load up a full browser and render the page in the background. On top of that i build in a mechanism that waits a few seconds after every request, so the program does't trigger any scraping alarms and gets your ip banned.

Then again, playlists over 100 videos is kind of an edge case, so maybe that's ok.