Twint: Twint doesn't get all followers list

Created on 29 Jan 2019 · 26Comments · Source: twintproject/twint

It seems Twint doesn't get the list of all followers for accounts with large number of followers and stoped abruptly at some random number. For example tried twint -u nasa --followers and each time the script stopped at some random number with few thousand screen_names.

[x] Python version is 3.6;
[x] Updated Twint with pip3 install --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint;
[x] I have searched the issues and there are no duplicates of this issue/question/request.

Twitter Flaw

Source

mmosleh

👍4

Most helpful comment

Hi. I faced same issue now.
I've trial and error so many times, And perhaps I found some workaround of this issue.

My found is this:

For example, twint -u nasa --following --resume nasa_following_resume.txt --limit 60 is basically works well.
When repeating above command in short period, We got CRITICAL:root:twint.feed:Follow:IndexError
But after waiting of several secounds, We can resume above command once again.
Wait and Resume above command, I can perfectly collect hundreds followings.

My Proposal is this:

twint command should add command line args like --wait-random 120, for example.
When twint faced CRITICAL:root:twint.feed:Follow:IndexError, twint should wait random seconds and try again the command.
For final ideal command is like this: twint -u nasa --following --wait-random 120.
- --resume filename is should automatically determine or store only in memory.
- --limit 60 is should determine appropriate default value.

yuiseki on 15 Nov 2019

👍12

All 26 comments

It could be possible that Twitter stops returning new entities because you (as everyone else) in that case requested too many queries.

pielco11 on 29 Jan 2019

Thanks @pielco11 , I was wondering what is the way around it. e.g., If it raises an error and it can continue pulling the information from that point. or controlling the requests within some limit so that doesn't happen. thanks

mmosleh on 29 Jan 2019

I'll look deeper (can't determine a dead-line), for now I can say that the issue does not seem to have an unique pattern. I
tried a couple of queries and got a long list, ~100k users. Plus when one stopped I started a new one, and this lasted a long. So I guess that's not Twitter that's blocking you, for what I tested I think that using a VPN will not get you around the issue.

A solution could be to re-try the query when it fails, anyway the code should be changed after a deeper look at what-is-going-on

pielco11 on 29 Jan 2019

Thank you @pielco11 !

mmosleh on 4 Feb 2019

I'll look deeper (can't determine a dead-line), for now I can say that the issue does not seem to have an unique pattern. I
tried a couple of queries and got a long list, ~100k users. Plus when one stopped I started a new one, and this lasted a long. So I guess that's not Twitter that's blocking you, for what I tested I think that using a VPN will not get you around the issue.

A solution could be to re-try the query when it fails, anyway the code should be changed after a deeper look at what-is-going-on

Hi, first of all, thanks for doing such amazing tool and public the code, I am sure I will learn a lot from your work. I tried getting a long list, ~130k and it stopped in random number of followers in each query.
On the other hand, I am making a script to get all tweet links of a user, because I think your tool does not do it. Without log in and without using the API, but in some way, after doing a lot of querys (with my script), twitter blocks your user's tweets search. Using an VPN, the problem was solved. This is just to give you some information.

Finally, If you have a PayPal account, I would like to buy you a coffe for posting the source code, because as I said, I would like to learn who do you did the tool, which would be impossible without the source code.

castrovictor on 9 Feb 2019

❤6

~130k followers are a lot so Twitter might be blocking requests at a random time

For the second point, that's why Twitter blocks an IP if it makes too many requests, that's why using a VPN solves the problem.

What we could try is handling that "followers count" issue and ask the user to change the IP and then retry the query, and see if this solves the issue.

Unfortunately I do not have enough time to solve every issue, so the patch will be delayed. Every kind of help in the development is widely accepted

pielco11 on 9 Feb 2019

@mmosleh Here is what's going on

immagine

In the first case there is a show more, Twint extracts that link and does a new request. Then that button vanishes so Twint is not able to make a new request.
If I get the last cursor-id and make a new request changing the IP and stuff, nothing changes

I think that we found the origin of the issue and sadly we can't do anything, at least for now

pielco11 on 10 Feb 2019

@pielco11 I made a quick dirty patch into the previous version of Twint (the one with a single file). Just few retrial on the last curser-id when receive the error massage. I managed to download all 32M NASA followers this way. (I'm not familiar the code base on the new version though)

mmosleh on 11 Feb 2019

@mmosleh oh, nice... may you provide me the commit id? git rev-parse HEAD

pielco11 on 11 Feb 2019

@mmosleh oh, nice... may you provide me the commit id? git rev-parse HEAD

So, was the update uploaded? Is it possible to download a large list of followers? as @mmosleh managed to do

castrovictor on 3 Mar 2019

Adding timeout seems to solve the issue.

Without timeouts I'm able to get upto 40 followers/following, adding time.sleep(3) to line 161 in twint/get.py allows me to get upto 440 followers/following

pielco11 on 26 Mar 2019

👍2

In the current iteration of get.py - has this issue been resolved? I'm not seeing the time.sleep(3) line within the script

thanks once again!

KristopherMakuch on 25 Apr 2019

👀1 👍1

@KristopherMakuch I did not apply that "patch" since I'm not sure that's a patch. More testing is needed, everyone is welcome to find a workaround

pielco11 on 25 Apr 2019

how do I include a control file to know on which page it stopped?

example:
twint -u username --followers -o username_followers.txt username_page.txt -t 3 -r 15

username_page.txt = file with last id page followers.
-t = 3 (Time elipse for new page followers)
-r = 15 (time random ofr new page followers)
the time to go to the next page to find followers should be the sum of t + r. R will always be random and can be 2, 3 or 15. So time will vary.

If processing is interrupted, it may after a while try to run again and will continue from the followers page according to the id of the page in file.

My original command:
twint -u username --followers -o username_followers.txt

my error today:
CRITICAL:root:twint.feed:Follow:IndexError

file with 1036 followers, but this profile have 3.800 followers.

Thanks again for all work.
Congratz

Sorry my poor english....
my first language is portugues.

Matiusco on 14 Jun 2019

Hi @Matiusco

to add some timeouts you have just to add a line as descried above

You could resume the scrape with something like twint -u username --followers -o user_followers.txt --resume username_followers_resume.txt

When Twint will stop (most probably because Twitter does not return more data) you will have just to re-run the command to resume from where it stopped

pielco11 on 15 Jun 2019

👍5

hi @pielco11
Thanks for informing.
I will try to perform this operation.

edited report
yes, work perfect now.
resume.txt is [] if finished. ;)

Matiusco on 17 Jun 2019

🎉2 👍2 ❤1

I can not get all followers.
Sometimes reaching up to 15,000 others ends in 9,000, but the resume file is empty [].

my command in terminal:

twint -u zehdeabreu --followers -o user_followers.txt --resume zehdeabreu_followers_resume.txt -t 15 -l 50

-t 15 and -l 50 not working...

How could I do inside a python file to control the time of each request to give a much longer time between requests?

=====
max followers at moment is:

wc -l user_followers.txt

16470 user_followers.txt

tks for all help

Matiusco on 19 Jun 2019

👀1

-t is not implemented, yet (at least); -l is for the lang, --limit is for the limit. If you want to control the time for each request, you have to play with get.py

Your query should be something like twint -u zehdeabreu --followers -o user_followers.txt --resume zehdeabreu_followers_resume.txt --limit 60

I also tried the resume option and it works fine

pielco11 on 19 Jun 2019

👍2 😕1

tks @pielco11 , I'll try

Matiusco on 20 Jun 2019

I had similar issue. The problem seems to be sometimes the "more" button
doesn't appear on Twitter end and Twint assumes that it has reached the end
of the list of followers. However, if it tries and sends another request,
the button might be available to the scraper ... .. So once quick fix could
be just get the number of followers first then keep retrying till the
number of followers matches.

On Thu, Jun 20, 2019 at 12:51 PM Matiusco notifications@github.com wrote:

tks @pielco11 https://github.com/pielco11 , I'll try

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/twintproject/twint/issues/340?email_source=notifications&email_token=AFIVAC5GAVFM5FJCHCHIFY3P3NOJ3A5CNFSM4GS5C642YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYFBXIQ#issuecomment-503978914,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFIVAC6RLTX7EKFIHGCYJQTP3NOJ3ANCNFSM4GS5C64Q
.

mmosleh on 20 Jun 2019

👍2 👀1

ok @mmosleh , but I do not know how I could change this part in code get.py

I still can not get all the followers.

Matiusco on 21 Jun 2019

How can I get the IDs of the followers instead of the username, please? Thank you

nxhuy-github on 26 Jun 2019

@nxhuy-github please write comments about the topic of the issue. Anyway you can do that using .Lookup as showed in the wiki

pielco11 on 26 Jun 2019

Hi, what is the current status for the code that retrieves all the followers of one person? I am still having the problem that only a subset of followers is downloaded. I am using the command
twint -u SpeakerPelosi --followers but unable to get all 3 millions followers (my result is only about 30k users). I saw that line 161 has a timeout. Would increasing this timeout help ?

datduong on 2 Oct 2019

@datduong Twitter works effectively to not allow Twint to get all the followers, I highly suggest you to use the API

pielco11 on 2 Oct 2019

👍3 😕2

Hi. I faced same issue now.
I've trial and error so many times, And perhaps I found some workaround of this issue.

My found is this:

For example, twint -u nasa --following --resume nasa_following_resume.txt --limit 60 is basically works well.
When repeating above command in short period, We got CRITICAL:root:twint.feed:Follow:IndexError
But after waiting of several secounds, We can resume above command once again.
Wait and Resume above command, I can perfectly collect hundreds followings.

My Proposal is this:

twint command should add command line args like --wait-random 120, for example.
When twint faced CRITICAL:root:twint.feed:Follow:IndexError, twint should wait random seconds and try again the command.
For final ideal command is like this: twint -u nasa --following --wait-random 120.
- --resume filename is should automatically determine or store only in memory.
- --limit 60 is should determine appropriate default value.