Homebrew-cask: Analytics for casks

Created on 27 May 2018  Â·  16Comments  Â·  Source: Homebrew/homebrew-cask

Enabling analytics for Casks was mentioned in https://github.com/Homebrew/homebrew-cask/pull/47609

core ready to implement

Most helpful comment

We also accept license agreements for some casks

That’s something I’d gladly see changed.

The main benefits of analytics I see are relative to removing casks:

  • Sometimes we get updates to casks that bump them two major versions, indicating it’s not used by many people. Analytics might confirm that.
  • When a cask breaks and the fix is non-trivial, a quick look at the statistics decides if we should fix it or remove it.
  • With versioned casks, we can get rid of a bunch of rules and replace them with “while statistics say it’s used more than X, we keep it, else we remove it”.

As for the Popcorn Time problem, that may not be an issue. Statistical data should be kept private except for the actual total numbers, which alone shouldn’t be enough to pinpoint an individual. But we can always keep all numbers private. Picking and choosing which casks are included in statistics may be viewed as a kind of an admission of being aware of the dubious legality of the software. For what it’s worth, if we do have any popular piracy tool (torrent clients don’t count, as they have plenty of legitimate uses) in the repos, I’m either not aware what those are, or don’t remember.

All 16 comments

Mind expanding a bit on your comment?

I'm just not sure about collecting analytics on all casks.

Why not? And which ones are you thinking about?

When this discussion came up originally, there was a strong (maybe even unanimous) vote towards no.

I think in that time Homebrew has handled analytics pretty well and I can see its usefulness in keeping things maintainable.

We should certainly consider how best to notify users about the change - many people do not read Brew change logs.

@adidalal Part of why I now think analytics are a good idea is I’m pretty sure no one knew our stance on the matter. HB and HBC are so tightly integrated now, I don’t think it goes through anyone’s head that we had this difference of opinion. Furthermore, even if some people knew and cared, those are the ones that turned off analytics in HB. Since we’d respect that same flag, I don’t see the need to warn about the change.

Yes, agreed. I’m wary of adding any additional command line output lest it break scripts.

Mind expanding a bit on your comment?

Sorry, wanted to give it some more thought first as I hadn't really given it serious consideration. I wasn't expecting it to become a topic of discussion until it was mentioned in https://github.com/Homebrew/homebrew-cask/pull/47609#issuecomment-392267241

https://github.com/Homebrew/homebrew-cask/issues/4323#issuecomment-42750981
We have several casks that I would consider to be similar enough to popcorn time that they could have the same legal issues.

This would also extend to third party taps as we have refused popcorn time in the official taps.

We also accept license agreements for some casks, keeping a record of when we do this strikes me as a bad idea.

I can see a few cases where it would be useful, e.g. I would like to know numbers for mono-mdk installs and be able to compare it to the mono formula. Also removing casks from versions that still work but are unused.

My main thought is while analytics could be useful in some cases, what would we do with it that would justify collecting this information? Would it cause any real change in how the taps are maintained?

We also accept license agreements for some casks

That’s something I’d gladly see changed.

The main benefits of analytics I see are relative to removing casks:

  • Sometimes we get updates to casks that bump them two major versions, indicating it’s not used by many people. Analytics might confirm that.
  • When a cask breaks and the fix is non-trivial, a quick look at the statistics decides if we should fix it or remove it.
  • With versioned casks, we can get rid of a bunch of rules and replace them with “while statistics say it’s used more than X, we keep it, else we remove it”.

As for the Popcorn Time problem, that may not be an issue. Statistical data should be kept private except for the actual total numbers, which alone shouldn’t be enough to pinpoint an individual. But we can always keep all numbers private. Picking and choosing which casks are included in statistics may be viewed as a kind of an admission of being aware of the dubious legality of the software. For what it’s worth, if we do have any popular piracy tool (torrent clients don’t count, as they have plenty of legitimate uses) in the repos, I’m either not aware what those are, or don’t remember.

I can see a few cases where it would be useful, e.g. I would like to know numbers for mono-mdk installs and be able to compare it to the mono formula. Also removing casks from versions that still work but are unused.
Picking and choosing which casks are included in statistics may be viewed as a kind of an admission of being aware of the dubious legality of the software.

Agreed on all this. I'm also a strong 👍 on having analytics on casks; it's been incredibly useful to Homebrew and it would be for Homebrew Cask, too. We've also stripped back the data we track, delete data after 1.5 years and have made much of the information public. I've had individuals from both Google and Microsoft contact my privately wanting this data for their open source projects so it's nice I can now point them to the public resources. It feels like Cask data would be similarly useful to other upstreams.

We also accept license agreements for some casks

That’s something I’d gladly see changed.

I'm not sure how cases like this would be handled?

https://github.com/Homebrew/homebrew-cask/blob/111a248f3c27e6faf139970c90f3fda572a95f43/Casks/java.rb#L7

Edit: Adding a screenshot of the Oracle website:
screen shot 2018-05-29 at 15 56 34

Statistical data should be kept private except for the actual total numbers, which alone shouldn’t be enough to pinpoint an individual. But we can always keep all numbers private.

Private != not subject to third-party access e.g. some form of legal demand.

We also have several apps like lantern, brook that are intended for censorship evasion, e.g. the GFW and various other places.

Picking and choosing which casks are included in statistics may be viewed as a kind of an admission of being aware of the dubious legality of the software.

Yes, completely agree that blacklisting some casks is a bad idea.

For what it’s worth, if we do have any popular piracy tool (torrent clients don’t count, as they have plenty of legitimate uses) in the repos, I’m either not aware what those are, or don’t remember.

There are only a handful, sonarr is one example.

The main benefits of analytics I see are relative to removing casks: ...

We already rather aggressively remove casks, analytics would allow us to remove even more but I don't see that as enough justification for collecting this information.

Basically I'm not convinced (at this stage anyway) that opening this particular can of worms is worth it.

We already rather aggressively remove casks, analytics would allow us to remove even more but I don't see that as enough justification for collecting this information.

Basically I'm not convinced (at this stage anyway) that opening this particular can of worms is worth it.

I think it's another thing trying to bring the projects more in line. Now that they are both in the same org and have their core code in the same repo I think it's a reasonable assumption (that will have no backlash) that we send similarly anonymous analytics data for Homebrew Cask as Homebrew itself.

Private != not subject to third-party access e.g. some form of legal demand.

We also have several apps like lantern, brook that are intended for censorship evasion, e.g. the GFW and various other places.

As the org administrator and being in the EU: I will not comply with this demand and there's no basis for anyone seeking it. If they did seek it they'd need to provide the user's GUID and it'd need to be within the window in which we automatically delete the data anyway. Note that Homebrew also includes censorship evasion tools and this hasn't been a cause for concern with our analytics.

I'm still not really convinced on cask analytics but I'm the only dissenting opinion so I guess we should move on and talk about implementation.

Slight tangent, but FWIW as a contrib, I wish there were more public analytics available. For instance when I run brew info or brew cask info if it told me the 30/365 day installs. I would find this information valuable as signal for which casks I should spend any time worrying about, and which casks I could recommend for removal.

I'm still not really convinced on cask analytics but I'm the only dissenting opinion so I guess we should move on and talk about implementation.

👍 It would be something like Utils::Analytics.report_event("install_cask", cask_name)

I have just been reminded this issue exists (and that I have previously commented in it!!).

@commitay

Basically I'm not convinced (at this stage anyway) that opening this particular can of worms is worth it.

I'm still not really convinced on cask analytics but I'm the only dissenting opinion so I guess we should move on and talk about implementation.

The reason I am very keen to get them:
working with automated cask scanning, and the near-future implementation of automated cask update submissions, I would like to know which casks to focus on based on which casks people are actually using.

These two links,
https://formulae.brew.sh/analytics/install/30d/
https://formulae.brew.sh/api/analytics/install/homebrew-core/30d.json
are really all I am interested in knowing for casks.

Writing (and maintaining) the automated updates for the top 50 casks (if that accounts for 80%+ of downloads) gets me a lot more motivated than just trying to guess or the daunting task of writing the automated updates for ~4000 casks.

Since this is now tagged as "ready to implement" what can I do to help it be implemented?
I am happy to review the way HB does the analytics and attempt to replicate it for HBC on a fork. Would that be helpful?

I am happy to review the way HB does the analytics and attempt to replicate it for HBC on a fork. Would that be helpful?

There has not been any progress on this so far, so feel free to start working on this.

Check out the usage of Utils::Analytics in Library/Homebrew/brew.rb and Library/Homebrew/formula_installer.rb to see how we're logging install, install_on_request and BuildError events. I'd suggest adding a new cask_install event.

PR subbed, feedback requested! https://github.com/Homebrew/brew/pull/4620

Was this page helpful?
0 / 5 - 0 ratings