If the issue is a request please specify that it is a request in the title (Example: [REQUEST] more features). If this is a question regarding 'twint' please specify that it's a question in the title (Example: [QUESTION] What is x?). Please only submit issues related to 'twint'. Thanks.
Make sure you've checked the following:
pip3 install --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint;Please provide the _exact_ command ran including the username/search/code so I may reproduce the issue.
Using a function, like this, to wrap the usage of twint:
import twint
def scrape(twitter_handle):
c = twint.Config()
c.Username = twitter_handle
c.Store_object = True
c.Limit = 1000
twint.run.Search(c)
return twint.output.tweets_object
Please use as much detail as possible.
With c.Store_object = True, results are stored in objects declared directly in the twint module. Since there aren't 'instances' of twint (it's a singleton import), if I run the function several times in the same program execution, the results all get appended to the same twint.output.tweets_object. I'm trying to determine if there's currently a way (without changing the core twint codebase), to explicitly tell it _which_ object to store scraped data each time.
(Thanks in advance!)
Using Windows, Linux? What OS version? Running this in Anaconda? Jupyter Notebook? Terminal?
twint module within a Quart application (like Flask, but with asyncio)There's not, but you could apply your filtering rules to the list and return it, so like having tweets in twint.output.tweets_object temporarily, analyse them, clean the list, and then go on. If you want to filter the tweets while scraping and not after the scraping process, you need to edit the codebase
Thanks @pielco11! In the case of running searches on a web server, we'd probably want to specify which list to store the results in.
What we've done in the meantime is added a key to the config object called c.Store_object_tweets_list which points to the list we want to store them in. If such an option is not specified, we default to using twint.output.tweets_object.
Is this something you'd be open to in a PR, or do you think there's not enough use-case demand for such an option?
@philuchansky feel free to open a PR for that new feature, that might be useful for others as well!