As I admit and we all can see, in the previous months the rate of new features and fixes decreased while the number of problems increased.
Honestly said, for me it's not that easy to find some time to assist everyone and try to find solutions, everyone works/studies and maintaining community projects is quite time consuming. At least for me.
If you have some experience in Python, what to be part of the project and able stand on your own feet, don't hesitate to ping me!
@pielco11 Maybe I can help; actually, I am actively doing a complete refactor on the entire code base in an attempt to fix the systemic structural problems. The challenge at the moment is that I lost interest in tracking 100% of the differences and some elements no longer exist (e.g. duplicated int / str attributes, configuration settings that conflict and produce erroneous results, etc.)
The improvements at the moment are quite exciting; last night I did a trial run and collected 25k tweets using the 'slow' profile method without a single failed request from twitter, which took 10m 33s instead of 2 hr 45 minutes.
I have been trying to figure out a way to present this new code without stepping on any toes or egos or both.
If you agree, I can give you write rights to push in a separate dev branch without PRs and stuff
I think for now I will linger and help out with issues with the current codebase.
Unfortunately, what i have going on is structurally very different than the current release, which essentially breaks the cli tool and all of the other tools listed in the twintproject organization.
Don't worry, any kind of help is welcome!
@pielco11 I'm interested in this. I could submit PR's against this on a fork. Also, ideally, I want to port this over to Racket :smile:
@BonfaceKilz that would be awesome!
@pielco11 Coun't me in. You can always message me here or send me an email. I love twint and I want to contribute as much as I can.
On that topic, I think it would be a nice idea to setup a bot to automatically clean up issues abandoned by their original posters after some time and maybe have someone to correctly tag the issues so it's easier for us to look into them.
@pielco11 Looking forward for your guidance 馃槃 (just so this issue don't die)
Are you still looking for help with this? This is an awesome project and I can spend time on this. Let me know!
I am not too familiar with the internals of Twint, but I can help out as well
I'm also going to set up a bigger donation on the Pateron to support all the hard work that has gone in to it
What Twint would need is a big breaking change
The scraping schema is gone
The starting point would be starting almost from zero and writing a scraper for mobile.twitter.com, if you guys are interested in this, I could setup a new repository called "twint-lite" (for example) to not mix up the things. WDYT?
@eh-93 thank you so much, but remind that as of now Twint does not run so it's not eligible for a donation, you know, people don't pay for things that don't run :D
I would absolutely support that @pielco11.
I also want to suggest to focus more on the core of scraping and to make it more "Unix" like so it does one thing very well and have it output for instance json lines that can be consumed by helper tools. Get rid of all things like Elastic, SQLite and translations and move them to external helpers. Restructure command line args and group them.
Maybe call it twint-core or twint-ng, this will be a more powerful building block so "lite" does not really honor what it will be!
Maybe you can set up a repo and discuss further in the team discussion there?
I would absolutely support that @pielco11.
I also want to suggest to focus more on the core of scraping and to make it more "Unix" like so it does one thing very well and have it output for instance json lines that can be consumed by helper tools. Get rid of all things like Elastic, SQLite and translations and move them to external helpers. Restructure command line args and group them.
Maybe call it twint-core or twint-ng, this will be a more powerful building block so "lite" does not really honor what it will be!
Maybe you can set up a repo and discuss further in the team discussion there?
Great suggestion, a lot of the additional tooling around Twint can be found in other libraries
If batches of results from a Twint run could be returned via a generator, it would make Twint more scalable and would make it easier to export the data to any format in a 'streaming' fashion
Similar to instaloader's approach - generator example below:
L = instaloader.Instaloader()
posts = instaloader.Hashtag.from_name(L.context, 'urbanphotography').get_posts()
SINCE = datetime(2015, 5, 1)
UNTIL = datetime(2015, 3, 1)
for post in takewhile(lambda p: p.date > UNTIL, dropwhile(lambda p: p.date > SINCE, posts)):
L.download_post(post, '#urbanphotography')
https://instaloader.github.io/codesnippets.html#download-posts-in-a-specific-period
I'm going to create twint-ng and set you @o7n and @eh-93 as collaborators with write rights so I'll not have to approve your changes
If anyone else is interested, just ping me
@pielco11
What is current status of work? Are we rewriting twint from scratch or trying to implement fixes starting from version 2.1.21?
From what I see twint-ng is pretty dead.
https://github.com/twintproject/twint-ng
Some twint functionality seems to work fine currently, favorites are still broke.
Most helpful comment
@pielco11 Maybe I can help; actually, I am actively doing a complete refactor on the entire code base in an attempt to fix the systemic structural problems. The challenge at the moment is that I lost interest in tracking 100% of the differences and some elements no longer exist (e.g. duplicated int / str attributes, configuration settings that conflict and produce erroneous results, etc.)
The improvements at the moment are quite exciting; last night I did a trial run and collected 25k tweets using the 'slow' profile method without a single failed request from twitter, which took 10m 33s instead of 2 hr 45 minutes.
I have been trying to figure out a way to present this new code without stepping on any toes or egos or both.