twint returns much less tweets than the number of tweets displayed on the Twitter page of a user.
Example with @BouloGiletJaune:
On Linux fedora29, using the latest twint 1.2.3 with Python 3.7 in a brand new virtual environment, when I run this command it returns about only half the tweets available:
$ twint --retweets -u BouloGiletJaune | wc -l
171
However this particular Twitter user has posted 278+ tweets: https://twitter.com/BouloGiletJaune?lang=en
Missing tweets occur with all the twitter accounts I could try.
Profile_full=TrueWith the Profile() function and Profile_full=True twint returns more tweets (272 tweets) yet not the right amount (278), but it's slow as hell:
$ time twint --profile-full --retweets -u BouloGiletJaune | wc -l
CRITICAL:root:twint.feed:Mobile:list index out of range
CRITICAL:root:twint.feed:Mobile:list index out of range
272
real 3m9.178s
user 1m11.755s
sys 0m1.081s
Profile_full is not a solutionMissing 6 tweets doesn't seem like a big issue, but with an account that has many more tweets (35k) --profile-full) still misses about 2,000 tweets, not mentioning the hours it takes to complete. So it's definitely not a viable workaround.
--all option doesn't seem to workTo be noted: the --all command line option is supposed to return *all* tweets associated with a user, but it doesn't seem to work:
(virtualenv) $ twint --all BouloGiletJaune
[-] Error: Please use at least -u, -s, -g or --near.
So, can you please let me know what I'm doing wrong or if you spot a problem?
Maybe it's some known limitation?
Thank you.
I've installed twint using this command from within the python3 virtual environment:
(virtualenv) $ pip3 install -e 'git+https://github.com/twintproject/twint.git@origin/master#egg=twint'
The number of tweets returned is too small:
$ time twint --retweets -u BouloGiletJaune | wc -l
171
real 0m15.453s
user 0m8.933s
sys 0m0.447s
pip version(virtualenv) $ pip --version
pip 19.1.1 from /home/user/git/project/virtualenv/lib/python3.7/site-packages/pip (python 3.7)
python versionFedora's stock version of Python is used. As with all virtualenv, binaries are automatically copied in the virtual environment.
(virtualenv) $ python -VV
Python 3.7.3 (default, May 11 2019, 00:45:16)
[GCC 8.3.1 20190223 (Red Hat 8.3.1-2)]
pip packages installedOnly twint has a local path because it's been installed using git (see above) which is the recommended way to install the latest version.
(virtualenv) $ pip list installed
Package Version Location
--------------- ------- --------------------------------
aiodns 2.0.0
aiohttp 3.5.4
aiohttp-socks 0.2.2
async-timeout 3.0.1
attrs 19.1.0
beautifulsoup4 4.7.1
cchardet 2.1.4
cffi 1.12.3
chardet 3.0.4
elasticsearch 7.0.2
fake-useragent 0.1.11
geographiclib 1.49
geopy 1.20.0
idna 2.8
multidict 4.5.2
numpy 1.16.4
pandas 0.24.2
pip 19.1.1
pycares 3.0.0
pycparser 2.19
PySocks 1.7.0
python-dateutil 2.8.0
pytz 2019.1
schedule 0.6.0
setuptools 41.0.1
six 1.12.0
soupsieve 1.9.2
twint 1.2.3 /home/user/git/project/virtualenv/src/twint
urllib3 1.25.3
wheel 0.33.4
yarl 1.3.0
twint package details(virtualenv) $ pip show twint
Name: twint
Version: 1.2.3
Summary: An advanced Twitter scraping & OSINT tool.
Home-page: https://github.com/twintproject/twint
Author: Cody Zacharias
Author-email: [email protected]
License: MIT
Location: /home/user/git/project/virtualenv/src/twint
Requires: aiohttp, aiodns, beautifulsoup4, cchardet, elasticsearch, pysocks, pandas, aiohttp-socks, schedule, geopy, fake-useragent
Required-by:
(virtualenv) $ uname -a
Linux work 5.1.11-200.fc29.x86_64 #1 SMP Mon Jun 17 19:30:44 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
(virtualenv) $ lsb_release
LSB Version: :core-4.1-amd64:core-4.1-noarch
(virtualenv) $ getenforce
Enforcing
First, thank you so much for taking time to write a properly documented issue
I tried twint --retweets -u BouloGiletJaune and got 176 tweets, still less than expected.
While this might sound like Twint is doing something wrong, the point is that Twitter stops Twint before reaching the beginning of the timeline. This can be verified via browser as showed in this screenshot

Which is the latest returned tweet by Twint (at least in my experience)

Profile_full = TrueThis option requires a lot of time by construction, this should be used only if the account is shadow banned (which means that you can't find his/her/its tweets via search bar)
There is a checking-args error (in Twint code) that I'm going to fix really quickly. Please consider that All option might require a lot of time since it returns tweets sent by/to him, and tweets that mention him
https://github.com/twintproject/twint/blob/ad27650fbc0bf8c3f2c78449088a5ede7239f53a/twint/url.py#L100-L101
If you use Twint as module, everything will work as expected
There are limitations with --retweets and --profile-full imposed by Twitter, limitations that we can't handle or workaround.
There's a checking-error in Twint code, which affects only if you use Twint via CLI
@pielco11 thank you for taking the time to formulate a prompt and structured response.
While this might sound like Twint is doing something wrong, the point is that Twitter stops Twint before reaching the beginning of the timeline. This can be verified via browser.
In the browser, the last tweet returned is indeed the same as twint grabs:
1069924606788685824 2018-12-04 12:01:19 BST <BouloGiletJaune> Un moratoire et quelques mesurettes...
In your response you're hinting that Twitter stops before reaching the beginning of the timeline. Maybe there are more tweets before that one? So I've checked using the python-twitter library which taps into Twitter API directly (but is limited to the last 3,200 tweets) and I got 274 tweets. So indeed it's like you wrote: Twitter is limiting scraping.
Most helpful comment
First, thank you so much for taking time to write a properly documented issue
Limits with --retweets
I tried
twint --retweets -u BouloGiletJauneand got176tweets, still less than expected.While this might sound like Twint is doing something wrong, the point is that Twitter stops Twint before reaching the beginning of the timeline. This can be verified via browser as showed in this screenshot
Which is the latest returned tweet by Twint (at least in my experience)
Profile_full = TrueThis option requires a lot of time by construction, this should be used only if the account is shadow banned (which means that you can't find his/her/its tweets via search bar)
All options
There is a checking-args error (in Twint code) that I'm going to fix really quickly. Please consider that
Alloption might require a lot of time since it returns tweets sent by/to him, and tweets that mention himhttps://github.com/twintproject/twint/blob/ad27650fbc0bf8c3f2c78449088a5ede7239f53a/twint/url.py#L100-L101
If you use Twint as module, everything will work as expected
Conclusions
There are limitations with
--retweetsand--profile-fullimposed by Twitter, limitations that we can't handle or workaround.There's a checking-error in Twint code, which affects only if you use Twint via CLI