Plots2: set up scraper for Public Lab's twitter account

Created on 30 May 2017 · 17Comments · Source: publiclab/plots2

background

Twitter is a microblogging platform launched in 2006 (no, no, jk jk, for real now)...

Twitter gives limited access to your timeline events. For instance, here's what the exported archive looks like when viewed in a browser -- it only shows what you tweeted or retweeted, not the activity on each tweet or what other people are talking about you:
screenshot of archive

And here is what analytics.twitter.com shows you in digested "month" overviews. Pretty good, but still doesn't capture who is talking about you, although it does give a count of total mentions:

twitter-2017-01

what we want, what we really really want

all mentions of publiclab across the twitterverse, which would include @publiclab, #publiclab or simply publiclab even public lab -- in a CSV file.
extract all hastags from the above set
snapshot of current followers to see who we've lost (how would we do comparisons over time?)

I can implement it in python, but the level of difficulty of someone implementing it in Ruby would be relatively simple (although that isn't to say that it wouldn't take time to implement). If it would help, I can write one in python and someone could translate it but that's probably not so great as we could just as easily run a python script. It just wouldn't fit in as neatly in the public lab codebase. An alternative is that the process is written in python and produces a parsed data structure (csv for example) which is consumed into the database by Ruby.

Resources:

https://github.com/sferik/twitter/tree/master/examples
https://robots.thoughtbot.com/ruby-wrapper-for-twitter-search-api (contains example with Active Record and Cron)

Basic steps:

[ ] Decide on a data structure for the SQL database
[ ] Create tables given structure
[ ] Setup an application on twitter to to authenticate one's self, and obtain auth keys (might be slightly different process when it's your own timeline)
[ ] Create a script to authenticate and collect the data
[ ] Create scripts to parse data and put in appropriate tables
[ ] Create scripts to analyze/manipulate/aggregate data (e.g. compare number of followers since last query)
[ ] Create a cron job (or something similar) to run the script

Other to-dos:

Logging features
Email notifications of failures

Bonus to-dos:

Front facing reporting: Could appear here? http://publiclab.org/stats

CC @ebarry

Source

skilfullycurled

🎉1 😄1

Most helpful comment

Liz is right. We can't let these bots dictate what we're not going to work on! They disrupt our faith in institutions. Sow discord and spread misinformation. But to quote former President George W. Bush: _'...There's an old saying in Tennessee...Fool me once, shame on ... shame on you. Fool me... You can't get fooled again!'_. So, I'll be damned if I'm going to let some bot waltz in here and seed the insidious idea that these issues won't get closed if we don't actively work on them.

picard_the_line

skilfullycurled on 8 Oct 2020

❤1 🎉1 😄1

All 17 comments

great! thanks for posting, i added in screenshots above.

ebarry on 30 May 2017

I added a help-wanted label, but if you think this isn't ready for input from folks, you could switch it to a break-me-up one. Could be good to also note exactly what twitter account, i.e. https://twitter.com/publiclab

jywarren on 1 Jun 2017

Thanks @jywarren, awesome idea, didn't know we had that!

There isn't a priority on having this feature fully integrated into the site so I think right now I'm imagining that this could be done in stages, something like the following:

I write something in Python which would create csv's of data for internal usage.
Ruby code is added to put csv's into the database.
Ruby code is added to replace the processing and parsing of the data
Ruby code is added to replace the getting of the data from Twitter.

I think the questions for the dev team that would help define the help wanted or which parts would need to be broken up are:

What are you feelings about having python code running side by side with the rest of the site?
How do you feel about having some separate Ruby scripts running along side the site?
Or, would you not want any code (Ruby or Python) interacting with the site unless it was built out in accordance with current style guides (e.g. with models, views, controllers, etc.)?

skilfullycurled on 2 Jun 2017

Hi all! If we are thinking about generating statistics about our tweets and social media there is already an opensource server application at https://github.com/loklak/loklak_server. It seems to be most appropriate solution as we can deploy our own server with it and enables to collect and share a large number of tweets. We can look into it further to know its usage and other features.

ananyo2012 on 2 Jun 2017

This will be awesome if it works!

I'm so glad you pointed it out. I can't tell you how much data I've missed out on gathering because I was too lazy to go to the computer to set up a script.

And...

They have a Python API!

I'll give it a run on my own machine when I get a chance.

skilfullycurled on 2 Jun 2017

Hi, @skilfullycurled -- just thoughts on these:

Ruby code is added to put csv's into the database.

Do you need to store additional data in the PublicLab.org database? Or is this part of a gradual plan to integrate this effort with the automated processes already on the site?

Ruby code is added to replace the processing and parsing of the data

If you can share a python script or even pseudocode of the kind of processing you're doing, we can help shape a set of issues to accomplish this, either in a separate script or via the API or an expansion of the API.

Ruby code is added to replace the getting of the data from Twitter.

Not sure if this is necessary -- do we need to store Twitter data in our website's database? But happy to brainstorm on this.

Thanks!

jywarren on 2 Jun 2017

Yes, although Ruby is nice, i guess there doesn't seem to be anything strongly motivating us to actually integrate this function with the PublicLab.org codebase -- it could be run as a standalone system at a subdomain or in a folder, and not have to add the complexity of integration, right? I def. appreciate the interest in a public facing tool, though!

loklak does look really great!

jywarren on 2 Jun 2017

I'll not respond to your first comment since your second comment captures it exactly. As long as the dev team is okay with it running as a standalone feature then no, there's no reason to integrate it. That would lower the complexity for everyone for sure.

So I think the plan then is to build out something separate (loklak or otherwise) that can deliver data through a feed so that in the future it can be front facing and others can work with the data if they so choose. If one day it becomes advantageous to integrate it more directly, then we can implement Operation Ruby Spear.

Thanks everyone!

skilfullycurled on 2 Jun 2017

I guess I can remove the help-wanted label for now if you're investigating loklak_server as an option?

jywarren on 13 Jun 2017

Sure. Although for those who stumble upon this thread, as always, help is welcome!

skilfullycurled on 26 Jun 2017

Also found this! https://github.com/chaoss/grimoirelab-perceval#rss

jywarren on 27 Sep 2019

Oop, i meant https://github.com/chaoss/grimoirelab-perceval#twitter

jywarren on 27 Sep 2019

Hi :smile:, this issue has been automatically marked as stale because it has not had recent activity. Don't worry you can continue to work on this and ask @publiclab/reviewers to add "work in progress" label :tada: . Otherwise, it will be closed if no further activity occurs in 5 days -- but you can always re-open it if you like! :100: Thank you for your contributions :raised_hands: :balloon:.

stale[bot] on 7 Oct 2020

She commented "#TwitterAintDead," thereby removing the 'stale' label.

ebarry on 7 Oct 2020

picard_the_line

skilfullycurled on 8 Oct 2020

❤1 🎉1 😄1

Hey Benjamin, what's our working definition of "_active_"? 😃

ebarry on 8 Oct 2020

That's just the point! Again, to quote GWB, [we're] the decider!

It seems like this will take some active discussion, so let's either file another issue here, or put it on the agenda for an open call. We have to stay pro-active on being pro-active.

In the meantime (and I say this with sincerity) what is it that we still want from this? Is this still simply about data? Are there new metrics (Instagram comes to mind) that would give a better picture?

skilfullycurled on 9 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Showing contributors on /contributors/____ pages

jywarren · 3Comments

Search API: endpoint /srch/nearbyPeople needs to sort by most recent users

milaaraujo · 3Comments

Dropdown menus: remove "Connect", update "Get Involved"

ebarry · 3Comments

Implement Mobile Responsiveness on The All Posts Page

RuthNjeri · 3Comments

Show powertags to admins only on /tags page

grvsachdeva · 3Comments