Issue: Using Amazon channel on YT as an example, they have over 5000 videos. However, I'm only able to download 100. It seems the list being constructed is only 100 long.
This illustrates what I've done so far to get URL and come to my conclusion:


Here's my download script:
#!/usr/local/bin/python
from pytube import Playlist
pl = Playlist("https://www.youtube.com/watch?v=58PpYacL-VQ&list=UUd6MoB9NC6uYN2grvUNT-Zg")
pl.populate_video_urls()
print "List size is %s:" % len(pl.video_urls)
# pl.download_all()
This returns a list of 100 YT video URLs. Hoping you can shed some light into this. I may be able to help fix, given bandwidth.
Thanks!
I did notice a secondary parameter, index=80, added to URL as I clicked on last item in the list:
https://www.youtube.com/watch?v=9EZvO5TviAM&list=UUd6MoB9NC6uYN2grvUNT-Zg&index=80
hi dreamingbinary,
I'm not a part of this project but i have run into the same issue.
Basically the problem is that what this library does is that it downloads the webpage that has the playlist on it. this page however only contains up to 100 videos. if a playlist is longer than this than the rest of those videos are requested later on by the browser.
at first i checked if it would be possible to spoof the requests. However the requests are incredebly complicated and you'd basically have reverse engineer all of the javascript to figure it out.
So instead i've written a program that uses selenium (a tool that lets you automate the browser) to render the page and get it to send the requests.
Here's the link to the repo:
https://github.com/johnvanderholt/youtube_playlist_downloader
You'll the main.py and drivers.py file. However you'll also need the geckodriver.exe and an installation of firefox.
I could submit a pull request if the maintainers of this project would like me to, However it is quite slow and expensive since you have to load up a full browser and render the page in the background. On top of that i build in a mechanism that waits a few seconds after every request, so the program does't trigger any scraping alarms and gets your ip banned.
Then again, playlists over 100 videos is kind of an edge case, so maybe that's ok.
@dreamingbinary greastg writeup! will try to investigate asap
Most helpful comment
hi dreamingbinary,
I'm not a part of this project but i have run into the same issue.
Basically the problem is that what this library does is that it downloads the webpage that has the playlist on it. this page however only contains up to 100 videos. if a playlist is longer than this than the rest of those videos are requested later on by the browser.
at first i checked if it would be possible to spoof the requests. However the requests are incredebly complicated and you'd basically have reverse engineer all of the javascript to figure it out.
So instead i've written a program that uses selenium (a tool that lets you automate the browser) to render the page and get it to send the requests.
Here's the link to the repo:
https://github.com/johnvanderholt/youtube_playlist_downloader
You'll the main.py and drivers.py file. However you'll also need the geckodriver.exe and an installation of firefox.
I could submit a pull request if the maintainers of this project would like me to, However it is quite slow and expensive since you have to load up a full browser and render the page in the background. On top of that i build in a mechanism that waits a few seconds after every request, so the program does't trigger any scraping alarms and gets your ip banned.
Then again, playlists over 100 videos is kind of an edge case, so maybe that's ok.