This is a followup to my Dockerized version of Twint that I mentioned in #579.
Unfortunately, this won't be a run-of-the-mill environment because Docker, but I'll get into that further down.
pip3 install --user --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint;is:issue since and nothing popped out at me.$ bash <(curl -s https://raw.githubusercontent.com/dmuth/twint-splunk/master/twint) -u dmuth --since 2009-08-01 --until 2009-09-01 --json -o tweets.json | pv -l >/dev/null
$ tail -n1 tweets.json | jq -r .id,.date
2455812675
2009-07-03
The oldest tweet fetched is from July 3, 2009. Expectation was that no tweet would be older than August 1st, 2009.
Docker on OS/X, as described in #579. The command above should work without issue on any machine with Docker installed.
Let me know if you need anything else, thank you!
-- Doug
I fixed what seems to be the error, basically I was splitting the whole interval into smaller pieces to prevent Twitter from blocking our requests since it sees that we are (could be) querying a large datetime-frame
It happened in the past that Twitter stopped way before reaching Since date, and we suspected that it blocked our "too large" request. So the workaround was to ask to Twitter from smaller dataframes.
So I removed this option since it creates more troubles than it tries to solve; the easy solution to the described issue is just to run Twint with smaller datetime-frames, which turns out to have the same effects
Now, before pushing around, I kindly ask you to upgrade twint with pip3 install --user --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint and let me know your results. In my tests everything went as expected and the issues seems to be resolved
Hi,
Totally understandable--it's not like Twitter has any obligation to make this easy for us. :-) I confirmed that this works:
$ tail -n1 tweets.json | jq -r .id,.date
3063824418
2009-08-01
$head -n1 tweets.json | jq -r .id,.date
3676643901
2009-08-31
However, while running that command, I saw this pop up 6 times:
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
Is that anything we should worry about? If not, I'll close this out, but I think adjusting the severity would be a good idea. :-)
Thanks,
-- Doug
I do believe that those errors are strongly related to #567
If you go in url.py and modify the base_urls (in the firsts lines of code) so replace https with http, and go inget.pyand edit theResponsefunction replacingssl=Truewithssl=False`.. you should get less error messages, at least that's what I get
Feel free to provide any feedback/suggestion
PS: I'd discuss about the errors' issue in the right one, just to keep everything in the right place
Hmm, that's really strange, since those errors don't seem to relate to SSL at all. As it stands, turning off SSL is not a great idea so I won't. :-)
Since the bug reported has been fixed, I'm gonna close this out. I _may_ open another bug in the future about that CRITICAL, depending how often it comes up and if I see it affecting the data retrieved in any way.
-- Doug
Most helpful comment
I fixed what seems to be the error, basically I was splitting the whole interval into smaller pieces to prevent Twitter from blocking our requests since it sees that we are (could be) querying a large datetime-frame
It happened in the past that Twitter stopped way before reaching
Sincedate, and we suspected that it blocked our "too large" request. So the workaround was to ask to Twitter from smaller dataframes.So I removed this option since it creates more troubles than it tries to solve; the easy solution to the described issue is just to run Twint with smaller datetime-frames, which turns out to have the same effects
Now, before pushing around, I kindly ask you to upgrade twint with
pip3 install --user --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twintand let me know your results. In my tests everything went as expected and the issues seems to be resolved