import twint
c = twint.Config()
c.Username = 'skyaskquestions'
c.Pandas = True
twint.run.Search(c)
one = twint.storage.panda.get()
users_one = list(set(one.username))
c = twint.Config()
c.Username = 'colebrob'
c.Pandas = True
twint.run.Search(c)
two = twint.storage.panda.get()
users_two = list(set(two.username))
>>>users_one
['skyaskquestions']
>>>users_two
['skyaskquestions', 'colebrob']
>>>
The issue is that running twint.storage.pandas.get() returns a database which retains results from previous searches, which I do not believe it was doing previous to the changes in storage conventions. If the point of including the pandas integration was adding metadata on a per search basis, it seems to run at odd with this purpose to have the function return previous search data.
Python 3.6.3 using IDLE in OSX 10.13.5
Thank you for pointing out, I'll add a funciton to clean the Dataset so something like twint.storage.panda.clean()or even adding config var to specify to "autoclean" at every search
Cool! As always, thanks for the work.
@pielco11 I added this functionality in #172, it doesn't have an autoclean mode however
Auto-clean feature added:
import twint
c = twint.Config()
c.Pandas = True
c.Pandas_clean = True # <=== here to auto-clean at every twint.run.Search(c)
....
You can set it as False to do some scraping session and get the whole dataframe whenever you prefer, if you want to clean at a certain time just run twint.store.panda.clean()
Most helpful comment
@pielco11 I added this functionality in #172, it doesn't have an autoclean mode however