Pytube: Really slow downloading with CLI vs Interactive Prompt

Created on 29 Dec 2017  Β·  24Comments  Β·  Source: pytube/pytube

Executing the CLI takes a little over eternity

alkpc@alkPC-Asus ~/.installs $  pytube https://www.youtube.com/watch?v=9wcSVTErT2U --list
<Stream: itag="22" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.64001F" acodec="mp4a.40.2">
<Stream: itag="43" mime_type="video/webm" res="360p" fps="30fps" vcodec="vp8.0" acodec="vorbis">
<Stream: itag="18" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.42001E" acodec="mp4a.40.2">
<Stream: itag="36" mime_type="video/3gpp" res="240p" fps="30fps" vcodec="mp4v.20.3" acodec="mp4a.40.2">
<Stream: itag="17" mime_type="video/3gpp" res="144p" fps="30fps" vcodec="mp4v.20.3" acodec="mp4a.40.2">
<Stream: itag="136" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.4d401f">
<Stream: itag="247" mime_type="video/webm" res="720p" fps="30fps" vcodec="vp9">
<Stream: itag="135" mime_type="video/mp4" res="480p" fps="30fps" vcodec="avc1.4d401e">
<Stream: itag="244" mime_type="video/webm" res="480p" fps="30fps" vcodec="vp9">
<Stream: itag="134" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.4d401e">
<Stream: itag="243" mime_type="video/webm" res="360p" fps="30fps" vcodec="vp9">
<Stream: itag="133" mime_type="video/mp4" res="240p" fps="30fps" vcodec="avc1.4d4015">
<Stream: itag="242" mime_type="video/webm" res="240p" fps="30fps" vcodec="vp9">
<Stream: itag="160" mime_type="video/mp4" res="144p" fps="30fps" vcodec="avc1.4d400c">
<Stream: itag="278" mime_type="video/webm" res="144p" fps="30fps" vcodec="vp9">
<Stream: itag="140" mime_type="audio/mp4" abr="128kbps" acodec="mp4a.40.2">
<Stream: itag="171" mime_type="audio/webm" abr="128kbps" acodec="vorbis">
<Stream: itag="249" mime_type="audio/webm" abr="50kbps" acodec="opus">
<Stream: itag="250" mime_type="audio/webm" abr="70kbps" acodec="opus">
<Stream: itag="251" mime_type="audio/webm" abr="160kbps" acodec="opus">
alkpc@alkPC-Asus ~/.installs $ pytube https://www.youtube.com/watch?v=9wcSVTErT2U --itag=22

PLAYMEN & CLAYDEE ft TAMTA - Tonight.mp4 | 29292427 bytes
^Calkpc@alkPC-Asus ~/.installs $ pytube https://www.youtube.com/watch?v=9wcSVTErT2U --itag=22.4%  # --> 20 minutes

PLAYMEN & CLAYDEE ft TAMTA - Tonight.mp4 | 29292427 bytes
^C^C^C^[[Aalkpc@alkPC-Asus ~/.installs $ pytube https://www.youtube.com/watch?v=9wcSVTErT2U --itag=22  # --> ~60 minutes

vs running the interactive:

>>> from pytube import YouTube
>>> YouTube('http://youtube.com/watch?v=9bZkp7q19f0').streams.get_by_itag(22).download()

Unfortunately, no time or timestamps available, so this is reported in good faith :confused:
Tests are executed in a P4 3.2GHz HT / 2.5GB Ram / SSD

alkpc@alkPC-Asus ~ $ neofetch
MMMMMMMMMMMMMMMMMMMMMMMMMmds+.        alkpc@alkPC-Asus
MMm----::-://////////////oymNMd+`     ----------------
MMd      /++                -sNMd:    OS: Linux Mint 18.3 Sylvia x86_64
MMNso/`  dMM    `.::-. .-::.` .hMN:   Kernel: 4.10.0-42-generic
ddddMMh  dMM   :hNMNMNhNMNMNh: `NMm   Uptime: 1 hour, 37 mins
    NMm  dMM  .NMN/-+MMM+-/NMN` dMM   Packages: 2755
    NMm  dMM  -MMm  `MMM   dMM. dMM   Shell: bash 4.3.48
    NMm  dMM  -MMm  `MMM   dMM. dMM   Resolution: 1280x1024
    NMm  dMM  .mmd  `mmm   yMM. dMM   DE: Cinnamon 3.6.7
    NMm  dMM`  ..`   ...   ydm. dMM   WM: Mutter (Muffin)
    hMM- +MMd/-------...-:sdds  dMM   WM Theme: New-Minty (Mint-Y-Dark-Polo)
    -NMm- :hNMNNNmdddddddddy/`  dMM   Theme: Mint-Y-Dark-Polo [GTK2/3]
     -dMNs-``-::::-------.``    dMM   Icons: Surfn-Numix-Polo [GTK2/3]
      `/dMNmy+/:-------------:/yMMM   Terminal: gnome-terminal
         ./ydNMMMMMMMMMMMMMMMMMMMMM   CPU: Intel Pentium 4 3.20GHz (2) @ 3.200GHz
            .MMMMMMMMMMMMMMMMMMM      GPU: NVIDIA GeForce 8400 GS Rev. 3
                                      Memory: 1708MiB / 2501MiB

yt-video-9wcSVTErT2U-1514569445.json.tar.gz
I can provide the partial (and the full file, if necessary)

bug stale

Most helpful comment

If anyone is trying to increase the speed, I've written a multiprocessing chunk downloading function.

import multiprocessing as mp
from math import ceil

import requests
from pytube import YouTube

CHUNK_SIZE = 3 * 2**20  # bytes

def download_video(video_url, itag, filename):
    stream = YouTube(video_url).streams.get_by_itag(itag)
    url = stream.url
    filesize = stream.filesize

    ranges = [[url, i * CHUNK_SIZE, (i+1) * CHUNK_SIZE - 1] for i in range(ceil(filesize / CHUNK_SIZE))]
    ranges[-1][2] = None  # Last range must be to the end of file, so it will be marked as None.

    pool = mp.Pool(min(len(ranges), 64))
    chunks = pool.map(download_chunk, ranges)

    with open(filename, 'wb') as outfile:
        for chunk in chunks:
            outfile.write(chunk)


def download_chunk(args):
    url, start, finish = args
    range_string = '{}-'.format(start)

    if finish is not None:
        range_string += str(finish)

    response = requests.get(url, headers={'Range': 'bytes=' + range_string})
    return response.content

download_video(video_url, 160, filename)

All 24 comments

I can confirm that the same problem is occurring for me.

this code in the interactive console is immediate.

yt = YouTube(video_url)
available_streams = yt.streams.filter(resolution="360p", subtype='mp4').asc()
available_streams.first().download('./Documentaries')

but extremely slow when running in my .py file...

https://github.com/nficano/pytube/blob/9b2345574430537d2d8b917ae399a7488e042248/pytube/cli.py#L162

Same issue here, and removing the on_progress_callback parameter seems to resolve the issue... well, at least it downloads faster although i still find it somewhat slow...

Same here, downloading with a on_progress_call speed goes to 250KB/s max. After removal speed goes up to 3MB/s max.
Any reason for that? I simply called a print for bytes_remaining / stream.filesize

If anyone is trying to increase the speed, I've written a multiprocessing chunk downloading function.

import multiprocessing as mp
from math import ceil

import requests
from pytube import YouTube

CHUNK_SIZE = 3 * 2**20  # bytes

def download_video(video_url, itag, filename):
    stream = YouTube(video_url).streams.get_by_itag(itag)
    url = stream.url
    filesize = stream.filesize

    ranges = [[url, i * CHUNK_SIZE, (i+1) * CHUNK_SIZE - 1] for i in range(ceil(filesize / CHUNK_SIZE))]
    ranges[-1][2] = None  # Last range must be to the end of file, so it will be marked as None.

    pool = mp.Pool(min(len(ranges), 64))
    chunks = pool.map(download_chunk, ranges)

    with open(filename, 'wb') as outfile:
        for chunk in chunks:
            outfile.write(chunk)


def download_chunk(args):
    url, start, finish = args
    range_string = '{}-'.format(start)

    if finish is not None:
        range_string += str(finish)

    response = requests.get(url, headers={'Range': 'bytes=' + range_string})
    return response.content

download_video(video_url, 160, filename)

What value needs to be for CHUNK_SIZE ?? @StasDeep

Anyway, you function is not working...

@yarodevuci the thing is that a video is downloaded by multiple requests, each fetching a separate chunk. So if the video is 6 Mb and CHUNK_SIZE = 3 * 2**20 (which is 3 Mb), then the video will be downloaded in 2 requests.

@StasDeep anyway i tried your code and it’s not working, maybe I used wrong chunk sizs? How do i calculate the size dynamically of the chunk?

@yarodevuci I've experimented a lot with CHUNK_SIZE values. Seems like 2-3 Mb is usually the best option (just set CHUNK_SIZE = 3 * 2**20). If you want to use dynamic chunk size for different videos, you can add an argument to download_video function and use it instead of CHUNK_SIZE constant.

@StasDeep I am getting this error with your script: requests.exceptions.InvalidURL: URL has an invalid label.

@yarodevuci does this problem occur with every video? Can you please give an example of video_url value which makes the function raise the exception?

@StasDeep yes, here is an example url "https://www.youtube.com/watch?v=DVaueIAhV6M"

Error happens in download_chunk(args) function

And I am using stream = YouTube('https://www.youtube.com/watch?v=DVaueIAhV6M').streams.filter(adaptive=True, only_audio=True, subtype='mp4').order_by('abr').desc().first()

@yarodevuci it works fine on my environment.
Here's how you can download audio (171 is itag for mime_type="audio/webm" abr="128kbps"):

download_video('https://www.youtube.com/watch?v=DVaueIAhV6M', 171, 'music.mp4')

I use Python 3.5.2, requests 2.18.4 and pytube 9.0.2.

@StasDeep Found the problem. Updated requests ...
Spasibo))
It downloads really fast πŸ‘

Last question @StasDeep will the progress from pytube work or it has to be implemented differently?

@yarodevuci yes, it has to be implemented in a different way. I tried using imap_unordered and it seems to work well. Here is a full example:

import multiprocessing as mp
import sys
from math import ceil

import requests
from pytube import YouTube

CHUNK_SIZE = 3 * 2**20  # bytes

def download_video(video_url, itag, filename):
    stream = YouTube(video_url).streams.get_by_itag(itag)
    url = stream.url
    filesize = stream.filesize

    ranges = [[url, i * CHUNK_SIZE, (i+1) * CHUNK_SIZE - 1] for i in range(ceil(filesize / CHUNK_SIZE))]
    ranges[-1][2] = None  # Last range must be to the end of file, so it will be marked as None.

    pool = mp.Pool(min(len(ranges), 64))
    chunks = [0 for _ in ranges]

    for i, chunk_tuple in enumerate(pool.imap_unordered(download_chunk, enumerate(ranges)), 1):
        idx, chunk = chunk_tuple
        chunks[idx] = chunk
        sys.stderr.write('\rDone: {0:%}'.format(i/len(ranges)))

    with open(filename, 'wb') as outfile:
        for chunk in chunks:
            outfile.write(chunk)

def download_chunk(args):
    idx, args = args
    url, start, finish = args
    range_string = '{}-'.format(start)

    if finish is not None:
        range_string += str(finish)

    response = requests.get(url, headers={'Range': 'bytes=' + range_string})
    return idx, response.content

download_video('https://www.youtube.com/watch?v=DVaueIAhV6M', 171, 'music.mp4')

I've been having the same issue when using this module within a program with a progress bar implementation that's similar to the CLI.
Though, I noticed that the performance seems to be significantly less impacted when performing the same operations with unrelated values, rather than actually utilizing the values passed to the function. I don't know if this is expected behavior or what the cause of this could be, but I thought I'd share my findings.

Same problem here. I had to remove: on_progress_callback
I think it should be fixed or, at least, to have a switch --no-progress

This issue is greatly mitigated by pull request #290:

$ time pytube https://www.youtube.com/watch?v=9wcSVTErT2U --itag=22
...
real    0m21,782s
user    0m9,787s
sys 0m7,489s

The fix looks to be implemented in v9.2.3 (not yet released on PyPI)

using pytube.__version__ 9.3.6 . The multiprocess download method is much much faster for me

I do not know if my problem is related to the one reported here, but why does PyTube not use my full bandwidth while downloading a youtube video?

It uses maybe at-most 1/4 of the maximum it could use of my bandwidth . . .

I have re-tried that recently:

$  pytube --version
pytube 9.4.0
$ /usr/bin/time bash -c 'pytube https://www.youtube.com/watch?v=9B-ENLt_ZQE --itag=137 ; pytube https://www.youtube.com/watch?v=9B-ENLt_ZQE --itag=140'

Ξ§Ξ™Ξš! ΞΌΞ΅ τη Χρύσα ΞšΞ±Ο„ΟƒΞ±ΟΞ―Ξ½Ξ· 25 - ft Στέλιος Ανατολίτης.mp4 | 480756697 bytes
 ↳ |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 100.0%

Ξ§Ξ™Ξš! ΞΌΞ΅ τη Χρύσα ΞšΞ±Ο„ΟƒΞ±ΟΞ―Ξ½Ξ· 25 - ft Στέλιος Ανατολίτης.mp4 | 31355509 bytes
 ↳ |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 100.0%
94.21user 1082.10system 53:09.91elapsed 36%CPU (0avgtext+0avgdata 18968maxresident)k
0inputs+0outputs (0major+26506033minor)pagefaults 0swaps

(Also, for what it's worth, in WSL, the amount of processes spawned and died is exceedingly too much. No idea if that happens also in e.g. Ubuntu. No idea how to measure that either, or if it is actually expected)

vs

$  /usr/bin/time ./pytube-download.py 'https://www.youtube.com/watch?v=9B-ENLt_ZQE' -v 137 -a 140 --no-auto-merge
Ξ§Ξ™Ξš! ΞΌΞ΅ τη Χρύσα ΞšΞ±Ο„ΟƒΞ±ΟΞ―Ξ½Ξ· #25 - ft. Στέλιος Ανατολίτης

<Stream: itag="137" mime_type="video/mp4" res="1080p" fps="30fps" vcodec="avc1.640028">
Ξ§Ξ™Ξš! ΞΌΞ΅ τη Χρύσα ΞšΞ±Ο„ΟƒΞ±ΟΞ―Ξ½Ξ· #25 - ft. Στέλιος Ανατολίτης

<Stream: itag="140" mime_type="audio/mp4" abr="128kbps" acodec="mp4a.40.2">
7.48user 4.70system 31:58.99elapsed 0%CPU (0avgtext+0avgdata 19524maxresident)k
0inputs+0outputs (0major+7263minor)pagefaults 0swaps

If dev(s) think that it cannot be improved any further, then you may as well close it. I guess we will never be able to saturate our bandwidth, but it still seems that the call is around x1.65 faster with the CLI. Of course, it could be that the process/callback is exactly the issue here.

Any solution?
On my VPS with 1GB/s upload/download speed, the maximin speed is about 5MB/s and it's slow.

I guess just switch to youtube-dl πŸ˜•
Project here seems unmaintained, whereas on the other side, you get tons of everything.

I won't add blatant advertisment (i.e. link), but _it's very close to here_

I guess just switch to youtube-dl πŸ˜•
Project here seems unmaintained, whereas on the other side, you get tons of everything.

I won't add blatant advertisment (i.e. link), but _it's very close to here_

yeah, I should switch toyoutube-dl.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

waterholic picture waterholic  Β·  24Comments

harindu95 picture harindu95  Β·  20Comments

stephanemombuleau picture stephanemombuleau  Β·  19Comments

kpister picture kpister  Β·  23Comments

MrspiLLnyK picture MrspiLLnyK  Β·  20Comments