Rubberduck: Exploring Telemetry

Created on 23 Aug 2019  路  14Comments  路  Source: rubberduck-vba/Rubberduck

We have many features, some more discoverable than others. We have memory pressure and performance issues, and only a vague intuitive idea of what's in our users' VBA projects that's based on our own individual experiences. We do have logging, and it does help (a lot) with debugging and diagnosis, but statistically a bug report or log file is nothing but an anecdote.

If Rubberduck had an opt-in setting to enable transparent telemetry (there's no way this is getting implemented without making very explicit what's being sent, where, when, and how), we could collect usage data, aggregate it, and craft a lovely PowerBI dashboard and monthly reports that could shed a lot of light on many, many things.

Some ideas, for usage data:

  • Distribution of OS and host application versions
  • What commands are used, from which menus (or hotkey?)
  • What inspections fire results, what results are ignored
  • What the user settings are; hotkey configs, logging level, UI language, inspection severities, identifier whitelist, indenter settings, todo markers, etc.
  • How is the unit testing framework being used? Are fakes used? Mocks?

Other ideas, for various metrics:

  • How long does it take to fully parse/process a project
  • How much code breaks SLL prediction mode
  • How many modules (of what kind) are in a VBA project
  • How many lines of code are in a module; cyclomatic complexity, nesting, etc.
  • What type libraries are referenced, what non-usercode members are invoked

The storage format probably requires a number of tables. How would that be best organized?

Anonymity concerns:

No PII or otherwise sensitive information shall be collected; usercode identifier names would only be collected with explicit and specific consent; it should be impossible to look at any given telemetry record and be able to say with 100% certainty "hey that's my record!".

Consuming the data:

The entire database shall be queryable with a public REST API; monthly reports could be emailed to subscribers.


Thoughts? Ideas? Concerns? Let's discuss this inside out.

discussion enhancement meta

Most helpful comment

I would have no problem with this at all. If its going to help you guys in any way at all, its worth doing.

All 14 comments

I would have no problem with this at all. If its going to help you guys in any way at all, its worth doing.

Typelibs sounds OK, just not actual lib names as some may be commercial libraries that may identify a class of user.
I'd be happy to test it, as long as I can see the collected data BEFORE transmitting it.

That's critical - the collected data should be visible on the client side at any time, in a nice human-readable format. IMO usercode, references, project and component names MUST be excluded, it shouldn't be possible to give consent for any of that.

Thinking about consent - thinking we could add a page to the installer, giving a synopsis of what would be collected, a decription of where the on/off switch is in the main addin, and a link for further details, with options:

  • Disable completely
  • Not right now (pre-selected)
  • Yes, that's fine.

For the top option, the installer could omit installation of the telemetry assembly at all, which should satisfy corporates with a risk-averse posture.

@mansellan I like that!

Another idea: "Send a frown :frowning_face:" and "Send a smile :smiley:" user feedback features, like Microsoft does with e.g. Excel telemetry?

  • Not right now (pre-selected)

What is the difference between that and "Disabled"? Do we prompt again week later, or something?

@Hosch250 that would be an installer prompt, so "disable completely" could not even install the Rubberduck.Telemetry assembly, while "not right now" would install it, but leave the setting disabled.

That'd be a pain if someone toggled it to Enabled after installing with it Disabled. Alternately, would we remove the DLL if they installed in to Enabled, then toggled to Disabled?

I think that if RD is installed under the "Disable Completely" option, the Telemetry page in the settings dialog should still be visible, but with wording like:

"Telemetry is not currently installed. If you wish to enable telemetry, please go to Control Panel, Programs and Features, then run the Rubberduck installer using Modify".

This:

  1. Provides a route to enable later, but
  2. Gives reassurance to the corporate IT reviewer that telemetry is an install-only option (which they can and will lock out)

Just installed Telerik Fiddler, and noticed this in the license agreement:

On startup, the Software anonymously checks for new versions; you may disable this feature if you prefer. You may opt-in to submitting anonymous data about your system configuration and use of the Software to help improve future versions of the Software. If you opt-in, Telerik may collect data related to: certain features and extensions of the Software, identifying trends and bugs, activation information, usage statistics and may track other data related to your use of the Software as further described in the most current version of Telerik鈥檚 Privacy Policy (located at: http://www.telerik.com/company/privacy-policy). You may be asked, from time to time, to respond to short survey questions presented within the Software鈥檚 user environment. Telerik may use your responses to these questions to serve you with targeted advertising content, to improve the Software, and/or for other purposes as described within the Privacy Policy. By your responding to such questions, opting-in to data collection, and/or acceptance of these terms and/or use of the Software, you authorize the collection, use and disclosure of all responses and data for the purposes provided for herein and/or in the Privacy Policy.

And this prompt on first startup:

Help Improve Progress Telerik Fiddler?

I like this approach... we'll need an explicit "privacy policy" legalese document though.

Maybe you can reach out to the Software Freedom Law Center. They offer pro-bono services for FLOSS projects. Not sure what requirements they have for determine who they鈥檙e willing to work with.

http://www.softwarefreedom.org/

@mansellan

IMO usercode, references, project and component names MUST be excluded, it shouldn't be possible to give consent for any of that.

source

Would it be possible to differentiate between elements and libraries from "standard VBA stuff" -- such as Excel, Access, ADO, DAO, WIA, MSHTML, Regex -- and custom user projects or referenced libraries? Maybe a list of the standard ones, and any non-standard library or element (element from a non-standard library) should not be included?

Hmm, hadn't considered that... I can't see the harm in having a library whitelist. Another option could be to hash all referenced libraries and send just the hashes, which we could then match up to hashes of known libraries. Either way, no private info is sent.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bclothier picture bclothier  路  3Comments

eteichm picture eteichm  路  4Comments

DecimalTurn picture DecimalTurn  路  3Comments

philippetev picture philippetev  路  3Comments

Gener4tor picture Gener4tor  路  3Comments