pip3 install --user --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint;Code ran:
import twint
c = twint.Config()
c.All = "realDonaldTrump"
c.Since = "2020-10-01"
c.Format = "User: {username} |Tweet: {tweet} |Replies: {replies} |Likes: {likes} |RT: {retweets} |Time: {date} {time}"
c.Store_csv = True
c.Output = "tweets.csv"
twint.run.Search(c)
I just updated twint and tried to run the same script I have been using the last months and got this error:
Traceback (most recent call last):
File "c:/Users/Ivan/Documents/menta/repos/twint_scraping/pruebas.py", line 13, in <module>
twint.run.Search(c)
File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\run.py", line 427, in Search
run(config, callback)
File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\run.py", line 319, in run
get_event_loop().run_until_complete(Twint(config).main(callback))
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python38\lib\asyncio\base_events.py", line 616, in run_until_complete
return future.result()
File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\run.py", line 239, in main
await task
File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\run.py", line 290, in run
await self.tweets()
File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\run.py", line 230, in tweets
await output.Tweets(tweet, self.config, self.conn)
File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\output.py", line 175, in Tweets
await checkData(tweets, config, conn)
File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\output.py", line 140, in checkData
output = format.Tweet(config, tweet)
File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\format.py", line 23, in Tweet
output = output.replace("{replies}", t.replies_count)
TypeError: replace() argument 2 must be str, not int
It was a type error which I solved by casting the value to a string (see below).
I also had to cast the retweets_count and likes_count attributes
What was
23 output = output.replace("{replies}", t.replies_count)
24 output = output.replace("{retweets}", t.retweets_count)
25 output = output.replace("{likes}", t.likes_count)
I changed to
23 output = output.replace("{replies}", str(t.replies_count))
24 output = output.replace("{retweets}", str(t.retweets_count))
25 output = output.replace("{likes}", str(t.likes_count))
Then I was getting this error:
Traceback (most recent call last):
File "c:/Users/Ivan/Documents/menta/repos/twint_scraping/pruebas.py", line 13, in <module>
twint.run.Search(c)
File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\run.py", line 427, in Search
run(config, callback)
File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\run.py", line 319, in run
get_event_loop().run_until_complete(Twint(config).main(callback))
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python38\lib\asyncio\base_events.py", line 616, in run_until_complete
return future.result()
File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\run.py", line 239, in main
await task
File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\run.py", line 290, in run
await self.tweets()
File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\run.py", line 230, in tweets
await output.Tweets(tweet, self.config, self.conn)
File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\output.py", line 175, in Tweets
await checkData(tweets, config, conn)
File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\output.py", line 140, in checkData
output = format.Tweet(config, tweet)
File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\format.py", line 27, in Tweet
output = output.replace("{is_retweet}", str(t.retweet))
AttributeError: 'tweet' object has no attribute 'retweet'
Which I solved by commenting the retweet and then user_rt_id
What was
27 output = output.replace("{is_retweet}", str(t.retweet))
28 output = output.replace("{user_rt_id}", str(t.user_rt_id))
I changed to
27 # output = output.replace("{is_retweet}", str(t.retweet))
28 # output = output.replace("{user_rt_id}", str(t.user_rt_id))
I am not sure if my changes are a good solution but they work for me now and maybe they will for someone else
@ivanlewin
yes, you are right
I changed to
23 output = output.replace("{replies}", str(t.replies_count)) 24 output = output.replace("{retweets}", str(t.retweets_count)) 25 output = output.replace("{likes}", str(t.likes_count))this will solve the first issue,
and
I changed to
27 # output = output.replace("{is_retweet}", str(t.retweet)) 28 # output = output.replace("{user_rt_id}", str(t.user_rt_id))this will solve the second issue too. (for now)
I have already put up a new PR #955. which solves this issue ofretweet&user_rt_id.
You can put up a PR for your fix, but first merge my branch branch into yours, so that you have my fixes too. then you'll have to fix up a few more things. which I can guide you through.
Although I'd recommend you to try as many as possible Format to try to find more bugs before you put up your PR.
@ivanlewin
yes, you are rightI changed to
23 output = output.replace("{replies}", str(t.replies_count)) 24 output = output.replace("{retweets}", str(t.retweets_count)) 25 output = output.replace("{likes}", str(t.likes_count))this will solve the first issue,
and
I changed to
27 # output = output.replace("{is_retweet}", str(t.retweet)) 28 # output = output.replace("{user_rt_id}", str(t.user_rt_id))this will solve the second issue too. (for now)
I have already put up a new PR #955. which solves this issue ofretweet&user_rt_id.You can put up a PR for your fix, but first merge my branch branch into yours, so that you have my fixes too. then you'll have to fix up a few more things. which I can guide you through.
Although I'd recommend you to try as many as possible
Formatto try to find more bug before you put up your PR.
Sure, will do!
this is what you'd need for handling mentions on line 34 in format.py, because in the new implementation mentions is not a _list_, instead it is a _dict_ :
34 output = output.replace("{mentions}", ",".join([json.dumps(mention) for mention in t.mentions]))
Hi, I merged your branch into my fork and made the changes, is that okay? I don't want to break anything heh.
I tested it using the same script I used originally and it worked
~looks good.~
@ivanlewin ok I found another issue.
Traceback (most recent call last):
File "ivan.py", line 11, in <module>
twint.run.Search(c)
File "/home/baapuji/tor_test/twint/twint/run.py", line 419, in Search
run(config, callback)
File "/home/baapuji/tor_test/twint/twint/run.py", line 315, in run
get_event_loop().run_until_complete(Twint(config).main(callback))
File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/home/baapuji/tor_test/twint/twint/run.py", line 235, in main
await task
File "/home/baapuji/tor_test/twint/twint/run.py", line 286, in run
await self.tweets()
File "/home/baapuji/tor_test/twint/twint/run.py", line 226, in tweets
await output.Tweets(tweet, self.config, self.conn)
File "/home/baapuji/tor_test/twint/twint/output.py", line 166, in Tweets
await checkData(tweets, config, conn)
File "/home/baapuji/tor_test/twint/twint/output.py", line 137, in checkData
output = format.Tweet(config, tweet)
File "/home/baapuji/tor_test/twint/twint/format.py", line 14, in Tweet
output = output.replace("{place}", t.place)
TypeError: replace() argument 2 must be str, not dict
might wanna fix this too
@himanshudabas cool. could you share me what you ran? I didn't get back any tweets with place info when I ran it
sure.
import twint
c = twint.Config()
c.Limit = 100
c.Search = "apple"
c.Store_json = True
c.Output = "tweets.json"
c.Format = "User: {username} |Tweet: {tweet} |Replies: {replies} |Likes: {likes} |RT: {retweets} |Time: {date} {time}"
c.Geo = "36.055121,-119.01595,10mi"
twint.run.Search(c)
Aah never mind. I figured out why this is happening.
when I implemented the parser for tweet data, I got things mixed up. I though place and geo were the same thing. clearly they are not.
I'll put up a fix for that perhaps tomorrow.
For some reason I am getting some t.quote_url as a 0 instead of an empty string

Should we check for str.isdigit() and replace with ""? I mean casting 0 to str would work but maybe it's more elegant in the final outcome?.
Side question: this format.py file is only for printing to the console, right? Or does it have to do with saving files as .csv, .json etc..
yes this is something i looked into yesterday.
I am planning to fix this inside tweet.py that is where quote_url is assigned so we won't have to check for that condition specifically.
Actually what 0 represent is that the Tweet contains a Quoted Tweet which has been deleted. That is the reason it's URL is not present.
I'm planning to replace the 0 with "<deleted>" which would be a string so there won't be a separate check required in format.py.
how did you change it to this:
23 output = output.replace("{replies}", str(t.replies_count))
24 output = output.replace("{retweets}", str(t.retweets_count))
25 output = output.replace("{likes}", str(t.likes_count))
How can i edit the package?
how did you change it to this:
23 output = output.replace("{replies}", str(t.replies_count))
24 output = output.replace("{retweets}", str(t.retweets_count))
25 output = output.replace("{likes}", str(t.likes_count))How can i edit the package?
You can fork the repo, download it and modify it as you please. Keep in mind to set your working directory to wherever you are editing your script, otherwise Python will be using the files in the site-packages folder
how did you change it to this:
23 output = output.replace("{replies}", str(t.replies_count))
24 output = output.replace("{retweets}", str(t.retweets_count))
25 output = output.replace("{likes}", str(t.likes_count))
How can i edit the package?You can _fork_ the repo, download it and modify it as you please. Keep in mind to set your working directory to wherever you are editing your script, otherwise Python will be using the files in the site-packages folder
21 output = output.replace("{hashtags}", ",".join(t.hashtags))
22 output = output.replace("{cashtags}", ",".join(t.cashtags))
---> 23 output = output.replace("{replies}", str(t.replies_count))
24 output = output.replace("{retweets}", str(t.retweets_count))
25 output = output.replace("{likes}", str(t.likes_count))
TypeError: replace() argument 2 must be str, not int
I replaced it. Yet, I still get the same error with updated code.