Twint: [QUESTION] Is there a way to specify *which* list to store tweets in with each search?

Created on 24 Jun 2019  路  3Comments  路  Source: twintproject/twint

Initial Check

If the issue is a request please specify that it is a request in the title (Example: [REQUEST] more features). If this is a question regarding 'twint' please specify that it's a question in the title (Example: [QUESTION] What is x?). Please only submit issues related to 'twint'. Thanks.

Make sure you've checked the following:

  • [X] Python version is 3.6;
  • [X] Updated Twint with pip3 install --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint;
  • [X] I have searched the issues and there are no duplicates of this issue/question/request.

Command Ran

Please provide the _exact_ command ran including the username/search/code so I may reproduce the issue.

Using a function, like this, to wrap the usage of twint:

import twint

def scrape(twitter_handle):
    c = twint.Config()
    c.Username = twitter_handle
    c.Store_object = True
    c.Limit = 1000
    twint.run.Search(c)
    return twint.output.tweets_object

Description of Issue

Please use as much detail as possible.

With c.Store_object = True, results are stored in objects declared directly in the twint module. Since there aren't 'instances' of twint (it's a singleton import), if I run the function several times in the same program execution, the results all get appended to the same twint.output.tweets_object. I'm trying to determine if there's currently a way (without changing the core twint codebase), to explicitly tell it _which_ object to store scraped data each time.

(Thanks in advance!)

Environment Details

Using Windows, Linux? What OS version? Running this in Anaconda? Jupyter Notebook? Terminal?

  • macOS 10.14.5
  • Running the twint module within a Quart application (like Flask, but with asyncio)
question

All 3 comments

There's not, but you could apply your filtering rules to the list and return it, so like having tweets in twint.output.tweets_object temporarily, analyse them, clean the list, and then go on. If you want to filter the tweets while scraping and not after the scraping process, you need to edit the codebase

Thanks @pielco11! In the case of running searches on a web server, we'd probably want to specify which list to store the results in.

What we've done in the meantime is added a key to the config object called c.Store_object_tweets_list which points to the list we want to store them in. If such an option is not specified, we default to using twint.output.tweets_object.

Is this something you'd be open to in a PR, or do you think there's not enough use-case demand for such an option?

@philuchansky feel free to open a PR for that new feature, that might be useful for others as well!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

PatricioAlvarado picture PatricioAlvarado  路  4Comments

edsu picture edsu  路  3Comments

dmuth picture dmuth  路  4Comments

nachotp picture nachotp  路  3Comments

Nestor75 picture Nestor75  路  4Comments