Twint: Behaviour of twint.storage.panda.get()

Created on 26 Jun 2018  路  5Comments  路  Source: twintproject/twint

Command Ran

import twint
c = twint.Config()
c.Username = 'skyaskquestions'
c.Pandas = True
twint.run.Search(c)
one = twint.storage.panda.get()
users_one = list(set(one.username))

c = twint.Config()
c.Username = 'colebrob'
c.Pandas = True
twint.run.Search(c)
two = twint.storage.panda.get()
users_two = list(set(two.username))

>>>users_one
['skyaskquestions']
>>>users_two
['skyaskquestions', 'colebrob']
>>>

Description of Issue

The issue is that running twint.storage.pandas.get() returns a database which retains results from previous searches, which I do not believe it was doing previous to the changes in storage conventions. If the point of including the pandas integration was adding metadata on a per search basis, it seems to run at odd with this purpose to have the function return previous search data.

Environment Details

Python 3.6.3 using IDLE in OSX 10.13.5

enhancement

Most helpful comment

@pielco11 I added this functionality in #172, it doesn't have an autoclean mode however

All 5 comments

Thank you for pointing out, I'll add a funciton to clean the Dataset so something like twint.storage.panda.clean()or even adding config var to specify to "autoclean" at every search

Cool! As always, thanks for the work.

@pielco11 I added this functionality in #172, it doesn't have an autoclean mode however

Auto-clean feature added:

import twint

c = twint.Config()
c.Pandas = True
c.Pandas_clean = True # <=== here to auto-clean at every twint.run.Search(c)
....

You can set it as False to do some scraping session and get the whole dataframe whenever you prefer, if you want to clean at a certain time just run twint.store.panda.clean()

Was this page helpful?
0 / 5 - 0 ratings