Twint: Extraction of tweets by username continually restarts

Created on 24 Oct 2018  路  41Comments  路  Source: twintproject/twint

Make sure you've checked the following:

  • [x] Python version is 3.6;
    $ python3 --version
    Python 3.6.7

  • [x] Using the latest version of Twint;
    $ git log | head -20
    commit 1229460da0747bd7f169ff82803a3162ea582225
    Author: Maxim Gubin maxim@kilograpp.com
    Date: Mon Oct 22 19:58:35 2018 +0300

    Fix Tor proxy not being able to connect (#248)

    • Fix Tor proxy not being able to connect

    • Fix request calls

    • Fix proxy args

  • [x] I have searched the issues and there are no duplicates of this issue/question/request.
    Can't find anything related to "start" "repeat" or "over" (as in over-and-over)

Command Ran

nohup python3 Twint.py -u rogerdc -o ../rogerdc.csv --csv &

Description of Issue

When I run this, I see that it starts by grabbing my most recent tweets. But every 20 or 40 lines, it starts over at the top.

I extracted the number of my latest tweet and searched for it in the CSV, printing the line number:
$ grep -n 1053736784176340996 rogerdc.csv
2:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
22:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
42:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
62:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
82:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
122:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
142:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
162:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
182:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
202:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
242:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
262:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
282:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
302:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
322:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
342:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
362:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
382:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
402:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
442:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
462:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
482:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
502:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
522:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
542:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
562:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
582:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
602:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
642:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
682:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
702:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
722:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
742:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
762:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube
802:1053736784176340996,2018-10-20,14:56:41,CDT,16143519,rogerdc,I added a video to a @YouTube playlist http://youtu.be/SDCJaDtI5YE?a聽 The Southern Regiment at UCM's Festival of Champions,0,0,0,,,https://twitter.com/rogerdc/status/1053736784176340996,,None,youtube

Environment Details

Linux, Debian Sid, 4.17.0-3-amd64 #1 SMP Debian 4.17.17-1 (2018-08-18) x86_64 GNU/Linux

forTwintIssue.zip

Twitter Flaw

Most helpful comment

I merged the #252 by @Mehdzor

Thank you all for your work!

All 41 comments

Same thing happens to me and some other dude in a new issue. I use windows on python 3.7 and mac with 3.6 and the same things happen. It start to loop the tweets in the output.

"Bug" confirmed and it seems that Twitter changed something

Temporary solution is passing --profile-full arg, it's slow but it works

Please note that this will not work with --search but only with --username

Not for every user-case, you can only scrape tweets of a specific user

Using the module, is there no solution for c.Search? Will we still be able to use twint properly for data gathering? We would like to gather tweets based on keywords. Any solutions available?

Unfortunately not yet, c.Search will not work

Any estimation when you could investigate the issue with the search?

@Mehdzor I'm already looking into this but it seems quite tricky and not so easy to spot

Any kind of help is appreciated

Also digging into it. Looks like there's a change in headers. I've made it working in curl by mimicking all the headers and then setting max_position to min_position of previous request.
That's something to start with. If you want to keep in track, contact me in Telegram: @mehdzor

@Mehdzor would you like to join slack? I think that's better and other people can join the discussion easily

Hope this gets fixed or what, just recently used this and it doesn't seem to work as it should now.

Hello I don't know if its okay to post here but could I comission anyone to put their time to fix it. I really was using twint for my research and now I can no longer get data so fast and reliable. Maybe the several people here that use it could crowdsource the fix.

I believe this PR should fix the issue: https://github.com/twintproject/twint/pull/251

@SebastianMuszynski The PR doesn't seem to solve the issue for me. Tweets are still repetitive.

Any updates?

I totally agree with @patowarheart. I really need twint for my research. It would be greatly appreciated if the issue would be resolved.

Same issue.

@rogerdc Have you seen https://github.com/twintproject/twint/pull/251?

I have just tested fetching tweets from your account with the following command:

python3 Twint.py -u rogerdc -o temp/rogerdc.csv --csv

and it looks like I successfully fetched 2808 tweets (the last one from 2008-09-05) without any duplicates.


@richieboy69 @RoiOng @patowarheart @CarbuncleOrigin @Mehdzor

Could you share how your commands look like?

To case studies this morning, very similar :

python3 Twint.py -s "blablabla" --since 2015-01-01 --until 2015-01-09 --csv -o blablabla.csv

works like a charm. Around 20.000 tweets collected on the whole period.

python3 Twint.py -s "mlomlomlo" --since 2015-01-01 --until 2015-01-10 --csv -o mlomlomlo.csv

Stops at 17:47:17 on the 8th of january, when I'm sure that others tweets are available online.
Launched this command twice (with tor, without tor) and in both cases it stops at the same moment.
(had to anonymize the request, mlomlomlo isn't what I'm looking for)

I see 2 PR, which is the correct one?

Both are working for me

I merged the #252 by @Mehdzor

Thank you all for your work!

Thank you all! it's now working

It's working for me now.
$ git log | head -20
commit 753b9294452dc727e7870ff9284f66e2ec8510a7
Author: Francesco Poldi francescopoldi@pielco11.ovh
Date: Thu Oct 25 21:17:07 2018 +0200

Update README

Unless @CarbuncleOrigin has problems in the next few days, I'll close this.

It's good @rogerdc

I'm still getting the same issue trying to use it as a module. I'm quite new to this so not sure what I'm doing wrong.

@conradkaz Try now

What do I need to do to update it? I uninstalled twint and then reinstalled it but I'm still getting the same issue. Again, sorry not very familiar with all this.

pip3 install --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint

the pip package is outdated, so install Twint module via git

Still not working for me. I assume I'm doing something wrong, but I dont know what. Got this error when I tried to import twint in my Jupyter Notebook.


ModuleNotFoundError Traceback (most recent call last)
in
----> 1 import twint

~/src/twint/twint/__init__.py in
9 '''
10 from .config import Config
---> 11 from . import run
12
13 #import logging

~/src/twint/twint/run.py in
----> 1 from . import datelock, feed, get, output, verbose, storage
2 from asyncio import get_event_loop
3 from datetime import timedelta, datetime
4 from .storage import db
5

~/src/twint/twint/get.py in
7 import concurrent.futures
8
----> 9 from aiohttp_socks import SocksConnector, SocksVer
10
11 from . import url

ModuleNotFoundError: No module named 'aiohttp_socks'

pip3 install -r requirements.txt

Better continue the discussion on Slack

Seems to be happening with the "-s" command now too ... ?

@richieboy69 testing with 2001 tweets
immagine

Also, the time feature is completely off

@colek319 is the "time feature" that you are referring to, related to this issue? If not I kindly ask you to open a new issue

Hi, I have the same problem with search too; using the latest version of twint that I just pulled from github:

python3 --version
Python 3.6.5
pip3 freeze | grep twint
twint==1.1.3.3

In Python:

  c = twint.Config()
  c.Custom = ["id", "date", "time", "user_id", "tweet"]
  c.Store_csv = True
  c.To = "POTUS"
  filename = "POTUS.csv"
  c.Since = "2018-10-21"
  c.Output = filename
  twint.run.Search(c)

With this code twint indefinitely loops through the last 20 tweets. Last week using the same code I could correctly search for all tweets starting from "2018-10-21". Something must have changed on the twitter side.

@alexlokhov please update the package with pip3 install --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint and then retry

Thanks, I was already using the latest one. Updated again, the problem persists.

Full code:

import twint

c = twint.Config()
c.Custom = ["id", "date", "time", "user_id", "tweet"]
c.Store_csv = True
c.To = "POTUS"
filename = "POTUS.csv"
c.Since = "2018-10-21"
c.Output = filename

Output:
image

As you can see, it goes back to the latest tweet after it loads the first twenty.

@alexlokhov

immagine

It's strange, I tried the same (and limited to ~220 tweets) and trying with random ids I'm not getting duplicates

Also trying with a tweet that you are getting twice

immagine

Yes, it is working now, thank you very much. I believe I had a module version problem. pip3 install --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint installed a new module in a specific directory, but globally still the older version was used.

@alexlokhov it may depend on your setup, like PYTHONPATH or something but definitely not realted to Twint (imho)

Please try removing Twint with pip and then retry with the customized command (it's working with my machine)

have same problem with repeating tweets. i've tried to update to the latest version with pip3 install --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint , but found an error

Error [WinError 2] The system cannot find the file specified while executing command git clone -q https://github.com/twintproject/twint.git C:\Users\mi\src\twint
Cannot find command 'git' - do you have 'git' installed and in your PATH?

what should I do?

@radw22 you have to install git on your system

done updated to the latest version, but I found this when updating:

Did not find branch or tag 'origin/master', assuming revision or ref.
image

is that ok? because when I try the command again, I still have the same issue with repeating tweets after 20 tweets. thx

Was this page helpful?
0 / 5 - 0 ratings