Binderhub: Track activity of a particular binder repository

Created on 27 Mar 2018  Â·  21Comments  Â·  Source: jupyterhub/binderhub

I recently made a small binder that contains some Dask examples. I'm curious about how much traction it has, how long users stay on the page, and how this has changed over time. This affects how much time I plan to invest in improving it.

I recently learned about https://grafana.mybinder.org (this is great by the way). After some inspection I determined that the answer to my question above probably isn't on this dashboard. I'm curious if that is possible though. I (and others I suspect) would find this information useful.

I imagine that there might be some concern about privacy here. My hope/guess is that releasing aggregate metrics of use is unlikely to infringe on anyone's privacy (especially on a public service) but I would not be surprised to learn that I was wrong.

enhancement discussion

All 21 comments

a couple quick thoughts:

  • Long-term, we should make this kind of data a public stream that people can pipe into from a binderhub deploy
  • However that's a more medium-term goal that requires building out new tech.
  • In the shorter-term, I'm hoping to make occasional data dumps of google analytics data available
  • @jzf2101 has also been doing some cool work along these lines and hopefully we'll have something for people to scan through
  • Privacy is definitely a concern, and I'm not sure exactly what are the edge-cases we should be worrying about here. I think aggregating the behavior and then removing repos with a really small # users might be enough

FWIW I would like to answer this question but I don't have an answer. I only know what repos are currently on binder.

You can get some basic popularity information out of prometheus with a notebook like https://mybinder.org/v2/gh/choldgraf/binder-stats/master?filepath=prometheus_demo.ipynb. You could also make a hacky "as a function of time" plot by varying the query window.

A related issue: public analytics&event stream (needs jupyterhub/binderhub#219)

FWIW we now have this dashboard on grafana:

https://grafana.mybinder.org/dashboard/db/pod-activity?orgId=1&panelId=1&fullscreen&from=1525118548912&to=1525129348912

Which is possible because BinderHub now logs the repo URL as a label when it posts its own prometheus metric. However, this label isn't exposed to the broader kubernetes prometheus metrics (like pod_created_at). If you _could_ access the repo name from those metrics, then I think this would satisfy @mrocklin 's case here. Just a note.

@minrk do you know if it's possible to add repo to the other prometheus metrics we have? (at least the ones from k8s) or will this move us into the "combinatorial explosion of unique labels" problem we've discussed?

Should we put this in the documentation somewhere?

which metrics to we want repo to add to?

I think we should be putting the repo url in the pod annotations (or labels?), which I believe we can do in the BinderSpawner. This may or may not show up in the metrics (I think labels will, but annotations won't?).

From a naive user perspective here, the question I most want to answer is "How is the adoption of my particular binder doing over the last few weeks/months?"

It looks like the current dashboards are more focused on short time frames, which is probably of more relevance to people maintaining the deployment.

That's true. And our Google Analytics should have this type of info, but we can't make that public except by manual dumps.

Any thoughts on allowing binder authors to supply a google analytics token
as part of the configuration?

On Thu, May 3, 2018 at 10:08 AM, Min RK notifications@github.com wrote:

That's true. And our Google Analytics should have this type of info, but
we can't make that public except by manual dumps.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/jupyterhub/binderhub/issues/504#issuecomment-386308066,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AASszDDhowmY5E5C8HvhZBwI_3STj9EPks5tuw9QgaJpZM4S9VD6
.

that's an interesting idea!

We should keep an eye on https://github.com/jupyterhub/mybinder.org-deploy/issues/500. People can already use custom templates with custom JS tracking inside their binders if they want to :-/ (I've never tried but I can't think of something that would stop me.)

Maybe we should add to our privacy statement that once you enter a binder "all bets are off" in terms of what will happen to you? It seems hard to enforce anything on the actual binders.

One thing that we don't have in this process, which I would find useful, is if we could collect total time spent by users within a binder and perhaps the distribution over that. Is there a way we could do this while stripping user info? Or even active window data? Would that be at all possible

I believe that we could get this information out of the prometheus data if we found a way to append the repo/org attributes to all prometheus metrics, instead of just to the binderhub launch metrics

@choldgraf has there been any forward movement on this sort of tracking? I saw #97, but was not able to find a resolution.

yes! thanks for mentioning it...check out https://archive.analytics.mybinder.org/ for a daily list of launches for all of the repositories on mybinder.org, does that help?

And, somewhat self-servingly, here is a binder using Dask to analyze that
data: http://examples.dask.org/applications/json-data-on-the-web.html
(binder link at the top)

(Disclaimer, dask is overkill for this, but some parts of the notebook
might be helpful anyway)

On Fri, Mar 22, 2019 at 1:29 PM Chris Holdgraf notifications@github.com
wrote:

yes! thanks for mentioning it...check out
https://archive.analytics.mybinder.org/ for a daily list of launches for
all of the repositories on mybinder.org, does that help?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/jupyterhub/binderhub/issues/504#issuecomment-475771080,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AASszPe7o5arGagdxEHakQdAk0V_aAZoks5vZT0_gaJpZM4S9VD6
.

I use https://github.com/betatim/binderlyzer to analyse our events archive. If you come up with interesting notebooks feel free to link to them from an issue or create a PR on that repo.

Thank you all! these were exactly the resources I was looking for.

FYI I opened up https://github.com/jupyterhub/binder/pull/160 to try making this information more discoverable. Not sure if it should close this issue or not (since this issue is about binderhub, and that documentation change is only about mybinder.org), I'll leave it up to @mrocklin to decide if it's enough :-)

Hi there 👋!

Can we move this discussion to http://discourse.jupyter.org/ (where most of the people who hang out here also hang out, as well as more people). We are trying to streamline the different discussion places and want to use the issues on GitHub repos for technical discussions on how to change the contents of the repo. More general discussions should go to discourse so that they are easier to find, better indexed by google and generally more accessible than being hidden in the bowls of a GitHub repository ;)

It seems like this issue is more about general how to do things or where to find something than changing this repo. If we want to discuss adjusting the docs we can open a new issue. So closing for now.

Was this page helpful?
0 / 5 - 0 ratings