Executing the CLI takes a little over eternity
alkpc@alkPC-Asus ~/.installs $ pytube https://www.youtube.com/watch?v=9wcSVTErT2U --list
<Stream: itag="22" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.64001F" acodec="mp4a.40.2">
<Stream: itag="43" mime_type="video/webm" res="360p" fps="30fps" vcodec="vp8.0" acodec="vorbis">
<Stream: itag="18" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.42001E" acodec="mp4a.40.2">
<Stream: itag="36" mime_type="video/3gpp" res="240p" fps="30fps" vcodec="mp4v.20.3" acodec="mp4a.40.2">
<Stream: itag="17" mime_type="video/3gpp" res="144p" fps="30fps" vcodec="mp4v.20.3" acodec="mp4a.40.2">
<Stream: itag="136" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.4d401f">
<Stream: itag="247" mime_type="video/webm" res="720p" fps="30fps" vcodec="vp9">
<Stream: itag="135" mime_type="video/mp4" res="480p" fps="30fps" vcodec="avc1.4d401e">
<Stream: itag="244" mime_type="video/webm" res="480p" fps="30fps" vcodec="vp9">
<Stream: itag="134" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.4d401e">
<Stream: itag="243" mime_type="video/webm" res="360p" fps="30fps" vcodec="vp9">
<Stream: itag="133" mime_type="video/mp4" res="240p" fps="30fps" vcodec="avc1.4d4015">
<Stream: itag="242" mime_type="video/webm" res="240p" fps="30fps" vcodec="vp9">
<Stream: itag="160" mime_type="video/mp4" res="144p" fps="30fps" vcodec="avc1.4d400c">
<Stream: itag="278" mime_type="video/webm" res="144p" fps="30fps" vcodec="vp9">
<Stream: itag="140" mime_type="audio/mp4" abr="128kbps" acodec="mp4a.40.2">
<Stream: itag="171" mime_type="audio/webm" abr="128kbps" acodec="vorbis">
<Stream: itag="249" mime_type="audio/webm" abr="50kbps" acodec="opus">
<Stream: itag="250" mime_type="audio/webm" abr="70kbps" acodec="opus">
<Stream: itag="251" mime_type="audio/webm" abr="160kbps" acodec="opus">
alkpc@alkPC-Asus ~/.installs $ pytube https://www.youtube.com/watch?v=9wcSVTErT2U --itag=22
PLAYMEN & CLAYDEE ft TAMTA - Tonight.mp4 | 29292427 bytes
^Calkpc@alkPC-Asus ~/.installs $ pytube https://www.youtube.com/watch?v=9wcSVTErT2U --itag=22.4% # --> 20 minutes
PLAYMEN & CLAYDEE ft TAMTA - Tonight.mp4 | 29292427 bytes
^C^C^C^[[Aalkpc@alkPC-Asus ~/.installs $ pytube https://www.youtube.com/watch?v=9wcSVTErT2U --itag=22 # --> ~60 minutes
vs running the interactive:
>>> from pytube import YouTube
>>> YouTube('http://youtube.com/watch?v=9bZkp7q19f0').streams.get_by_itag(22).download()
Unfortunately, no time or timestamps available, so this is reported in good faith :confused:
Tests are executed in a P4 3.2GHz HT / 2.5GB Ram / SSD
alkpc@alkPC-Asus ~ $ neofetch
MMMMMMMMMMMMMMMMMMMMMMMMMmds+. alkpc@alkPC-Asus
MMm----::-://////////////oymNMd+` ----------------
MMd /++ -sNMd: OS: Linux Mint 18.3 Sylvia x86_64
MMNso/` dMM `.::-. .-::.` .hMN: Kernel: 4.10.0-42-generic
ddddMMh dMM :hNMNMNhNMNMNh: `NMm Uptime: 1 hour, 37 mins
NMm dMM .NMN/-+MMM+-/NMN` dMM Packages: 2755
NMm dMM -MMm `MMM dMM. dMM Shell: bash 4.3.48
NMm dMM -MMm `MMM dMM. dMM Resolution: 1280x1024
NMm dMM .mmd `mmm yMM. dMM DE: Cinnamon 3.6.7
NMm dMM` ..` ... ydm. dMM WM: Mutter (Muffin)
hMM- +MMd/-------...-:sdds dMM WM Theme: New-Minty (Mint-Y-Dark-Polo)
-NMm- :hNMNNNmdddddddddy/` dMM Theme: Mint-Y-Dark-Polo [GTK2/3]
-dMNs-``-::::-------.`` dMM Icons: Surfn-Numix-Polo [GTK2/3]
`/dMNmy+/:-------------:/yMMM Terminal: gnome-terminal
./ydNMMMMMMMMMMMMMMMMMMMMM CPU: Intel Pentium 4 3.20GHz (2) @ 3.200GHz
.MMMMMMMMMMMMMMMMMMM GPU: NVIDIA GeForce 8400 GS Rev. 3
Memory: 1708MiB / 2501MiB
yt-video-9wcSVTErT2U-1514569445.json.tar.gz
I can provide the partial (and the full file, if necessary)
I can confirm that the same problem is occurring for me.
this code in the interactive console is immediate.
yt = YouTube(video_url)
available_streams = yt.streams.filter(resolution="360p", subtype='mp4').asc()
available_streams.first().download('./Documentaries')
but extremely slow when running in my .py file...
https://github.com/nficano/pytube/blob/9b2345574430537d2d8b917ae399a7488e042248/pytube/cli.py#L162
Same issue here, and removing the on_progress_callback parameter seems to resolve the issue... well, at least it downloads faster although i still find it somewhat slow...
Same here, downloading with a on_progress_call speed goes to 250KB/s max. After removal speed goes up to 3MB/s max.
Any reason for that? I simply called a print for bytes_remaining / stream.filesize
If anyone is trying to increase the speed, I've written a multiprocessing chunk downloading function.
import multiprocessing as mp
from math import ceil
import requests
from pytube import YouTube
CHUNK_SIZE = 3 * 2**20 # bytes
def download_video(video_url, itag, filename):
stream = YouTube(video_url).streams.get_by_itag(itag)
url = stream.url
filesize = stream.filesize
ranges = [[url, i * CHUNK_SIZE, (i+1) * CHUNK_SIZE - 1] for i in range(ceil(filesize / CHUNK_SIZE))]
ranges[-1][2] = None # Last range must be to the end of file, so it will be marked as None.
pool = mp.Pool(min(len(ranges), 64))
chunks = pool.map(download_chunk, ranges)
with open(filename, 'wb') as outfile:
for chunk in chunks:
outfile.write(chunk)
def download_chunk(args):
url, start, finish = args
range_string = '{}-'.format(start)
if finish is not None:
range_string += str(finish)
response = requests.get(url, headers={'Range': 'bytes=' + range_string})
return response.content
download_video(video_url, 160, filename)
What value needs to be for CHUNK_SIZE ?? @StasDeep
Anyway, you function is not working...
@yarodevuci the thing is that a video is downloaded by multiple requests, each fetching a separate chunk. So if the video is 6 Mb and CHUNK_SIZE = 3 * 2**20 (which is 3 Mb), then the video will be downloaded in 2 requests.
@StasDeep anyway i tried your code and itβs not working, maybe I used wrong chunk sizs? How do i calculate the size dynamically of the chunk?
@yarodevuci I've experimented a lot with CHUNK_SIZE values. Seems like 2-3 Mb is usually the best option (just set CHUNK_SIZE = 3 * 2**20). If you want to use dynamic chunk size for different videos, you can add an argument to download_video function and use it instead of CHUNK_SIZE constant.
@StasDeep I am getting this error with your script: requests.exceptions.InvalidURL: URL has an invalid label.
@yarodevuci does this problem occur with every video? Can you please give an example of video_url value which makes the function raise the exception?
@StasDeep yes, here is an example url "https://www.youtube.com/watch?v=DVaueIAhV6M"
Error happens in download_chunk(args) function
And I am using stream = YouTube('https://www.youtube.com/watch?v=DVaueIAhV6M').streams.filter(adaptive=True, only_audio=True, subtype='mp4').order_by('abr').desc().first()
@yarodevuci it works fine on my environment.
Here's how you can download audio (171 is itag for mime_type="audio/webm" abr="128kbps"):
download_video('https://www.youtube.com/watch?v=DVaueIAhV6M', 171, 'music.mp4')
I use Python 3.5.2, requests 2.18.4 and pytube 9.0.2.
@StasDeep Found the problem. Updated requests ...
Spasibo))
It downloads really fast π
Last question @StasDeep will the progress from pytube work or it has to be implemented differently?
@yarodevuci yes, it has to be implemented in a different way. I tried using imap_unordered and it seems to work well. Here is a full example:
import multiprocessing as mp
import sys
from math import ceil
import requests
from pytube import YouTube
CHUNK_SIZE = 3 * 2**20 # bytes
def download_video(video_url, itag, filename):
stream = YouTube(video_url).streams.get_by_itag(itag)
url = stream.url
filesize = stream.filesize
ranges = [[url, i * CHUNK_SIZE, (i+1) * CHUNK_SIZE - 1] for i in range(ceil(filesize / CHUNK_SIZE))]
ranges[-1][2] = None # Last range must be to the end of file, so it will be marked as None.
pool = mp.Pool(min(len(ranges), 64))
chunks = [0 for _ in ranges]
for i, chunk_tuple in enumerate(pool.imap_unordered(download_chunk, enumerate(ranges)), 1):
idx, chunk = chunk_tuple
chunks[idx] = chunk
sys.stderr.write('\rDone: {0:%}'.format(i/len(ranges)))
with open(filename, 'wb') as outfile:
for chunk in chunks:
outfile.write(chunk)
def download_chunk(args):
idx, args = args
url, start, finish = args
range_string = '{}-'.format(start)
if finish is not None:
range_string += str(finish)
response = requests.get(url, headers={'Range': 'bytes=' + range_string})
return idx, response.content
download_video('https://www.youtube.com/watch?v=DVaueIAhV6M', 171, 'music.mp4')
I've been having the same issue when using this module within a program with a progress bar implementation that's similar to the CLI.
Though, I noticed that the performance seems to be significantly less impacted when performing the same operations with unrelated values, rather than actually utilizing the values passed to the function. I don't know if this is expected behavior or what the cause of this could be, but I thought I'd share my findings.
Same problem here. I had to remove: on_progress_callback
I think it should be fixed or, at least, to have a switch --no-progress
This issue is greatly mitigated by pull request #290:
$ time pytube https://www.youtube.com/watch?v=9wcSVTErT2U --itag=22
...
real 0m21,782s
user 0m9,787s
sys 0m7,489s
The fix looks to be implemented in v9.2.3 (not yet released on PyPI)
using pytube.__version__ 9.3.6 . The multiprocess download method is much much faster for me
I do not know if my problem is related to the one reported here, but why does PyTube not use my full bandwidth while downloading a youtube video?
It uses maybe at-most 1/4 of the maximum it could use of my bandwidth . . .
I have re-tried that recently:
$ pytube --version
pytube 9.4.0
$ /usr/bin/time bash -c 'pytube https://www.youtube.com/watch?v=9B-ENLt_ZQE --itag=137 ; pytube https://www.youtube.com/watch?v=9B-ENLt_ZQE --itag=140'
Ξ§ΞΞ! ΞΌΞ΅ ΟΞ· Ξ§ΟΟΟΞ± ΞΞ±ΟΟΞ±ΟΞ―Ξ½Ξ· 25 - ft Ξ£ΟΞΞ»ΞΉΞΏΟ ΞΞ½Ξ±ΟολίΟΞ·Ο.mp4 | 480756697 bytes
β³ |βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100.0%
Ξ§ΞΞ! ΞΌΞ΅ ΟΞ· Ξ§ΟΟΟΞ± ΞΞ±ΟΟΞ±ΟΞ―Ξ½Ξ· 25 - ft Ξ£ΟΞΞ»ΞΉΞΏΟ ΞΞ½Ξ±ΟολίΟΞ·Ο.mp4 | 31355509 bytes
β³ |βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100.0%
94.21user 1082.10system 53:09.91elapsed 36%CPU (0avgtext+0avgdata 18968maxresident)k
0inputs+0outputs (0major+26506033minor)pagefaults 0swaps
(Also, for what it's worth, in WSL, the amount of processes spawned and died is exceedingly too much. No idea if that happens also in e.g. Ubuntu. No idea how to measure that either, or if it is actually expected)
vs
$ /usr/bin/time ./pytube-download.py 'https://www.youtube.com/watch?v=9B-ENLt_ZQE' -v 137 -a 140 --no-auto-merge
Ξ§ΞΞ! ΞΌΞ΅ ΟΞ· Ξ§ΟΟΟΞ± ΞΞ±ΟΟΞ±ΟΞ―Ξ½Ξ· #25 - ft. Ξ£ΟΞΞ»ΞΉΞΏΟ ΞΞ½Ξ±ΟολίΟΞ·Ο
<Stream: itag="137" mime_type="video/mp4" res="1080p" fps="30fps" vcodec="avc1.640028">
Ξ§ΞΞ! ΞΌΞ΅ ΟΞ· Ξ§ΟΟΟΞ± ΞΞ±ΟΟΞ±ΟΞ―Ξ½Ξ· #25 - ft. Ξ£ΟΞΞ»ΞΉΞΏΟ ΞΞ½Ξ±ΟολίΟΞ·Ο
<Stream: itag="140" mime_type="audio/mp4" abr="128kbps" acodec="mp4a.40.2">
7.48user 4.70system 31:58.99elapsed 0%CPU (0avgtext+0avgdata 19524maxresident)k
0inputs+0outputs (0major+7263minor)pagefaults 0swaps
If dev(s) think that it cannot be improved any further, then you may as well close it. I guess we will never be able to saturate our bandwidth, but it still seems that the call is around x1.65 faster with the CLI. Of course, it could be that the process/callback is exactly the issue here.
Any solution?
On my VPS with 1GB/s upload/download speed, the maximin speed is about 5MB/s and it's slow.
I guess just switch to youtube-dl π
Project here seems unmaintained, whereas on the other side, you get tons of everything.
I won't add blatant advertisment (i.e. link), but _it's very close to here_
I guess just switch to
youtube-dlπ
Project here seems unmaintained, whereas on the other side, you get tons of everything.I won't add blatant advertisment (i.e. link), but _it's very close to here_
yeah, I should switch toyoutube-dl.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
If anyone is trying to increase the speed, I've written a multiprocessing chunk downloading function.