Odo: Track telemetry implementation for odo

Created on 25 Feb 2021 · 46Comments · Source: openshift/odo

/kind feature
/priority high

Acceptance Criteria

For first pass -

[x] 1. Allow user to enable/disable telemetry (#4456)
[x] 2. Prompt users to set their preference for telemetry (#4482)
[x] 3. Collect information such as - (#4482)
- - [x] 3.1. OS information
- - [ ] 3.2. ~RAM, CPU size, number of cores~
- - [x] 3.3. Odo version
- - [x] 3.4. Errors, success/failure, time taken for a command to complete
- - [x] 3.5 Component type (java, maven, quarkus, python, etc) (#4662)
- - [ ] 3.6 Cluster type
- - [ ] 3.7 Cluster ID
[ ] 4. Sanitize the collected data to free it of any user specific data(#4482, #4586)
[x] 5. Send the data to Segment(#4482)
[x] 6. POC - odo team uses telemetry, sends data to segment, examines how it works before releasing it.
[x] 7. Document the usage data. Reference - https://github.com/redhat-developer/vscode-openshift-tools/blob/master/USAGE_DATA.md(#4482)
[x] 8. Stop wrong data(i.e. test and CI data) from being sent to Segment.(#4580)
[x] 9. Make segment upload asynchronous, hence making it faster. (#4565)
[ ] 10. Add documentation about odo telemetry command in dev docs.

For later -
Collect information such as -

User preferences
Proxy enabled
Cluster type: OCP or K8s

Why is this needed?

https://github.com/openshift/odo/issues/1236

kinfeature point3 prioritHigh

Source

valaparthvi

Most helpful comment

This looks like a good start.

Just one note. Don't forget about correctly handling anonymousId (using existing or generating new if needed) as described in Telemetry Guidence document.
You can check how CRC is doing it https://github.com/code-ready/crc/blob/194965b96b3c2f8ed534da1b8bfaeb2e43790612/pkg/crc/segment/segment.go#L117-L135

kadel on 25 Feb 2021

🚀1 ❤1

All 46 comments

This looks like a good start.

kadel on 25 Feb 2021

🚀1 ❤1

Ditto, looks like good coverage. It's implied but the timing/error is really logging every command run with success/failure and time.

I don't recall if odo failure messages sometimes include user info like id or paths. If not then you can just send those (or you can send reason codes like "ran out of space" or "network error") and avoid having to sanitize.

deboer-tim on 25 Feb 2021

👍1

[ ] User preferences(?)

[ ] Proxy enabled(?)

I would say that we don't need this at the beginning. Let's keep the initial telemetry simple and small. We can always add more later.

kadel on 1 Mar 2021

Of course. I added them as a placeholder so that I don't forget to discuss them.

valaparthvi on 1 Mar 2021

alias AC=Acceptance Criteria

Scope of this issue for Sprint 198 -

[x] Get https://github.com/openshift/odo/pull/4456 merged so that AC#1 can be marked as done.
[x] Work on getting AC#3.4,4,5 this sprint, if time permits then on the rest of AC#3.
- [x] AC 3.4
- [x] AC 3.3
- [x] AC 4
- [x] AC 5
[x] For AC#5 I have segment code in place(see PR4482) but I am still struggling to send all data to segment because of the placement of function that uploads data to segment segmentClient.Upload is incorrect, it will be fixed once I figure where to hook the function which I think is somewhere in LogErrorAndExit. Currently, it only sends commands that ran successfully or did not break short because of some error.
[x] Add test cases.
[x] AC 2 - Add prompt
[x] AC 7 - Add usage data
[x] Once the above is done, I will be able to fix any new errors that pop up.

valaparthvi on 4 Mar 2021

/reopen
only the first item was done

kadel on 8 Mar 2021

🚀1

@kadel: Reopened this issue.

In response to this:

/reopen
only the first item was done

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot on 8 Mar 2021

Scope of this issue for Sprint 199 -

[x] Merge #4482
[ ] Add Cluster type information (k8s or OpenShift)
[x] Sanitize the data coming into Segment
[ ] Figure what devfile are users using - java, nodejs, quarkus, maven?
[ ] Analyze what more data needs to go in and add them.
[ ] Make segment upload asynchronous.

valaparthvi on 25 Mar 2021

Add Cluster type information (k8s or OpenShift)

I'm interested in this information since I want to understand if how wilde OCP 3.11 is used.

Other than that, I'm also interested in knowing if users use Service Catalog. I'm not sure how/if this information can be fetched. One way could be by looking at the odo service create commands being executed. Other way could be to see if the cluster in question has API objects mentioned here viz. ClusterServiceClass and ClusterServiceBroker.

WDYT @kadel @deboer-tim ?

dharmit on 26 Mar 2021

Add Cluster type information (k8s or OpenShift)

@dharmit @girishramnani Can I get a pointer on this one?

valaparthvi on 26 Mar 2021

Add Cluster type information (k8s or OpenShift)

I'm interested in this information since I want to understand if how wilde OCP 3.11 is used.

Other than that, I'm also interested in knowing if users use Service Catalog. I'm not sure how/if this information can be fetched. One way could be by looking at the odo service create commands being executed. Other way could be to see if the cluster in question has API objects mentioned here viz. ClusterServiceClass and ClusterServiceBroker.

WDYT @kadel @deboer-tim ?

I've said it before, but once we have basic telemetry for each command we should go ahead and 'ship it'. There are some obvious other things we can add, but this is incremental and we'll think of others later/as we review telemetry. The top things I think I'd want to know are:

The kind of project used (i.e. name of parent devfile or s2i).
The kind of cluster they're connecting to (i.e. Kind vs Minikube vs CRC vs OpenShift).
What kind of service they're creating or binding to.

deboer-tim on 26 Mar 2021

👍1

Consenting to Telemetry makes odo significantly slower. Currently, 2 messages are sent to Segment, 1) Identify - that contains system information and, 2) Track - that contains information about the executed command.

To mend this @kadel and I discussed a few solutions, which I am listing here -
1) Tomas suggested that the reason this is slow is that it takes time to open the client and close it, the amount of data sent does not affect the speed much. So we can send some data before the command runs, i.e. once the command is validated, we send the Track message and then let odo do its thing. Since sending the message is asynchronous, it will not affect odo's execution, and there won't be any delay that we see at the end. This also means that we will not send information about the success of a command. But in case the command fails, we will send a message about it, in which case, there will be some delay at the end.

2) Run a daemon process that sends the data to segment even after the actual odo execution has finished.

I think the first solution seems to the optimum one. I am going to test it and see if it helps.

There are 2 things that we will need to change regardless of the solution we go with -
1) Set BatchSize to 1 while instantiating a new segment Client so that it sends a message as soon it is queued.
2) It does not make sense to send Identify message with every command since this information is included in the Track message. So we will only send the Identify message when a new anonymousID is created.

Tomas, I feel like I am missing one other thing that we discussed, but I hope not.

valaparthvi on 30 Mar 2021

@kadel The first implementation mentioned here https://github.com/openshift/odo/issues/4462#issuecomment-810152058 when implemented with _BatchSize=1 and Interval=1Millisecond_ seems to be working fine with long running commands such odo create or odo push, but it still has the same delay with fast commands such as odo preference view or odo config view. IMO it's perhaps still better than what it is right now.
Perhaps we can skip uploading the data for a command that does _view_ or _list_ operations or do you think that data is important?

valaparthvi on 30 Mar 2021

2. Run a daemon process that sends the data to segment even after the actual odo execution has finished.

Thinking out loud - can we fork a new process that sends the information after odo is done executing its stuff?

dharmit on 30 Mar 2021

Thinking out loud - can we fork a new process that sends the information after odo is done executing its stuff?

Yeah, that is what Tomas suggested.

valaparthvi on 30 Mar 2021

Thinking out loud - can we fork a new process that sends the information after odo is done executing its stuff?

Yeah, that is what Tomas suggested.

Ah, OK. My bad. When I saw the word "daemon", I assumed something like a client-server architecture.

dharmit on 30 Mar 2021

Other teams spawn background threads for telemetry so that it doesn't block the foreground action. I agree that's not really feasible for cli tools because you don't want to be done the operation and then not get control back because it is still waiting for the message to be sent.

Spawning a secondary process to send each message might work, but that sounds pretty heavyweight. Is it hard to cache the messages to disk? I was wondering if quicker commands could save the event and then we flush these together when a longer command like push() is used. I'm not sure if this is possible or whether it would affect the timestamps, but we don't need every message immediately.

deboer-tim on 30 Mar 2021

I just realized something.
We need to carefully consider how we structure events.

Currently, we use the odo command name (like odo create or odo push) as an event/action identifier. If I understand how Woopra structures data correctly, each Action can have its own schema. In my mind, I see Actions as database tables. And each time you send and event with the same Action identifier you create a new row in that table. If we have a separate Action type for each it means that we can track different properties for each command. For example for odo create we can have a property called devfileName or odo url create can have urlType: <route/ingress> property.

Tomas suggested that the reason this is slow is that it takes time to open the client and close it, the amount of data sent does not affect the speed much. So we can send some data before the command runs, i.e. once the command is validated, we send the Track message and then let odo do its thing. Since sending the message is asynchronous, it will not affect odo's execution, and there won't be any delay that we see at the end. This also means that we will not send information about the success of a command. But in case the command fails, we will send a message about it, in which case, there will be some delay at the end.

The approach that we discussed with @valaparthvi requires using one event/action type for all commands. The idea was that odo will send cmd.run event right before it does the action. And if the command failed it will send an additional event cmd.error. This will allow us to track failed commands.
But now I'm afraid that this will make it harder to track separate properties for each command, as they all will be of the same type.

All this is based on my still quite limited Woopra understanding. It might be good if someone with more Woopra experience can help us properly structure the data. Correct structure is going to be crucial to make data useful.

kadel on 31 Mar 2021

So, what if we let all the other actions be as they are, but upload an action called cmd.error in case of an error?

valaparthvi on 1 Apr 2021

Stop wrong data(i.e. test and CI data) from being sent to Segment.

4580 attempts at fixing this.

valaparthvi on 1 Apr 2021

Scope of this issue for Sprint 199 -

* [x]  Merge #4482

* [ ]  Add Cluster type information (k8s or OpenShift)

* [ ]  Sanitize the data coming into Segment

* [ ]  Figure what devfile are users using - java, nodejs, quarkus, maven?

* [ ]  Analyze what more data needs to go in and add them.

* [ ]  Make segment upload asynchronous.

Rescoping the issue this sprint -

[x] Merge #4482
[ ] Make segment upload asynchronous and faster
[x] Sanitize the data coming into Segment (this is more of an ongoing thing)

valaparthvi on 6 Apr 2021

Requoting from https://github.com/openshift/odo/pull/4565#issuecomment-814029219

On second thought, sending 2 different events won't have much of an impact.
In most cases, we will only send one message to Segment, and if BatchSize is 1, it will take the same time as it is taking now. I am going to revert the changes now. I think the one option we have right now is doing something with the process.

Having BatchSize set to 1 will help when we have to send 2 messages, for e.g. Identify message and Track message. At the moment, I don't see much use in having a separate event for failure.

valaparthvi on 6 Apr 2021

/reopen

valaparthvi on 6 Apr 2021

We have decided to give a shot to the second idea mentioned here https://github.com/openshift/odo/issues/4462#issuecomment-810152058, i.e. Creating a background process that will send the data to Segment after odo finishes execution.

I had a discussion regarding this with @kadel and based on his suggestions, this is how I am planning to approach this -

Create a new odo command for telemetry that will accept the command and a few other things as an argument. It would look something like this - odo telemetry --action "odo preference view" --duration 5 --errorString "failed to find preference.yaml file" --errorType "errors.NotFound". The Run method will initiate the segmentClient, call its upload method, and close the connection. _This command will be hidden so that people cannot call it._
Initiate a new process that will run the odo telemetry command. This process is only started if the user consented to telemetry. Before the main odo command exits, errorString, errorType, and duration will be calculated, then the telemetry process will be run with all the necessary args, and the main odo command will exit.

Tomas, let me know if something is amiss.

valaparthvi on 9 Apr 2021

Create a new odo command for telemetry that will accept the command and a few other things as an argument. It would look something like this - odo telemetry --action "odo preference view" --duration 5 --errorString "failed to find preference.yaml file" --errorType "errors.NotFound". The Run method will initiate the segmentClient, call its upload method, and close the connection. _This command will be hidden so that people cannot call it._

I would maybe consider passing data to stdin as json instead of using flags. This makes future extensions easier.

We can just marshall the analytics.Track that gets currently passed to c.SegmentClient.Enqueue to JSON and instead of calling Enqueue we would execute odo telemetry and send this JSON to its stdin

{
 "userId": "829bf3e1-89b6-40ee-b096-e196e919bde2",
 "event": "odo list",
 "timestamp": "0001-01-01T00:00:00Z",
 "properties": {
 "duration(ms)": 18,
 "error": "Get \"https://kubernetes.docker.internal:6443/apis/apps.openshift.io/v1?timeout=32s\": dial tcp 127.0.0.1:6443: connect: connection refused",
 "error-type": "*net.OpError",
 "success": false,
 "tty": true,
 "version": "odo v2.0.8 (791738b24)"
 }

odo telemetry command will just marshall received JSON to analytics.Track struct and pass it to Enequeue.

This approach will leave a lot of room for future extensions, like for example passing multiple Track structs to a single odo telemetry instance.

kadel on 9 Apr 2021

👍1

Is the odo telemetry command supposed to be executed by a user? If no, does it make sense to expose it as a command?

dharmit on 12 Apr 2021

Is the odo telemetry command supposed to be executed by a user? If no, does it make sense to expose it as a command?

It's not, that is why it is going to be a hidden command.

valaparthvi on 12 Apr 2021

Then why are we adding it as a separate command and not as a function that gets called as and when needed. If it's needed for most (all?) commands then we can even add it to the Runnable interface - https://github.com/openshift/odo/issues/4313.

dharmit on 12 Apr 2021

Then why are we adding it as a separate command and not as a function that gets called as and when needed. If it's needed for most (all?) commands then we can even add it to the Runnable interface - #4313.

I don't understand what you mean.
We need to send telemetry in a separate background process. As far as I know, you can't just call function and make it a separate process.

kadel on 12 Apr 2021

Scope of the issue for Sprint 200 -

[x] 1. Make telemetry asynchronous (a.k.a don't wait for telemetry to complete.) (#4565)
[ ] 2. Figure what devfile are users using - java, nodejs, quarkus, maven? (#4662)
[x] 3. Get #4586 merged.

valaparthvi on 12 Apr 2021

I don't understand what you mean.

My thought was to fork/create a new process for the telemetry stuff in such a way that the logic/code that needs to be executed is run totally independent and in the background. By "independent" I mean that it's not dependent on odo binary in any way except that at the end of running an odo command (odo push, odo url create, etc.) odo would trigger this code to run in a separate process on the OS. The odo command that triggered this would not wait for telemetry logic to execute. Nor would it care if it succeeds or fails. By "background" I mean that the logic/code executed in this way would not interrupt the user in any way irrespective of success/failure in executing it.

This also means that we have to do our part in making sure that it's reliable enough because once it's out there, we don't have a way to debug if telemetry is running OK on the users' systems. Note that this needs to be tested on all the platforms - Linux, Windows & macOS.

Adding a method to the Runnable interface was the implementation level detail, and it can be discussed.

The thought process is along the lines of a recent blog I read about "Closing web browser windows doesn't close connections". I don't know if it's technically possible to do this for our use case.

Also, if we really go down this route, it is of utmost importance to mention this clearly in our docs and discuss the implementation detail there.

dharmit on 14 Apr 2021

Adding a method to the Runnable interface was the implementation level detail, and it can be discussed.

I don't understand, how is this going to help us with the problem at hand?

valaparthvi on 15 Apr 2021

I don't understand, how is this going to help us with the problem at hand?

It's not directly related. It could be one way to implement things. But there can be another way.

Anyway, I had a chat with @kadel and it seems like I was interpreted as being trying to force to use the approach that I thought of.

As I said in my previous comment - I don't know if the idea I shared was technically possible for our use case.

I reiterate that I don't mean that the idea I had is the one we should go with. I shared it, people liked it on the cabal, I was asked to elaborate it on GH and so I did. But it's fine by me if adding odo telemetry is the best way right now and we don't want to spend cycles on the idea I shared.

The way I see it is if I suggested an approach, I should also be able to show how it can be done. But I already said I don't know.

@valaparthvi @kadel I'm sorry I came across as being hell-bent on following the approach I suggested. That was never the intention.

dharmit on 15 Apr 2021

Making telemetry asynchronous to odo tldr;

The data collection and the uploading process start right before the command logic is run. The library itself sends data asynchronously, so the program can continue its normal execution, while the library does its thing. But, when the program is done running the command logic first, and the library is still in the process of sending the data, the program waits for the other process to finishes and we notice a significant delay before the command exits. To fix this delay, we decided to run a process that would upload the data right after the command exits and this would happen in the background so that it does not interrupt the user.

We discovered os.StartProcess().Release() that will help achieve just that for Unix and something similar for windows. But as per my understanding, it also requires that the first argument to os.StartProcess() be a program. To fix this, we thought of adding a new hidden command/flag telemetry to odo that would take data as an argument and upload it, but this would also mean there is a chance that the user might discover this command and try to use it, which we want to avoid.

So we are looking for alternatives to trigger a process or a function call that would upload the data after the main command/program finishes execution.

odo telemetry is our backup solution and I am going to work on it for now.

valaparthvi on 15 Apr 2021

/reopen

@valaparthvi maybe, Fixes part of #issue number would not automatically close the issue.

dharmit on 16 Apr 2021

@dharmit: Reopened this issue.

In response to this:

/reopen

@valaparthvi maybe, Fixes part of #issue number would not automatically close the issue.

openshift-ci-robot on 16 Apr 2021

@dharmit I did add partly to my Fixes #issue part of the PR, but it doesn't work, it almost always closes the issue. I guess I'll try with maybe next time.

And about your comment https://github.com/openshift/odo/issues/4462#issuecomment-820351517, of course, I didn't think you were being hell-bent or forceful, I'm sorry if I implied that in some way.
I understand what you are suggesting, but the only way to fork a process or create a new one is to use a binary/executable program. You don't want it to be dependent on odo binary, in which case, we can create a separate library to upload data to segment and use it's binary instead, but that seems like a bit of a stretch.

I have one more question though - what do you think would be better, command or flag? The problem with a hidden command is, it still shows in the usage doc when odo --help is called for some reason. Since it is a hidden command, I wouldn't expect it to show in the usage doc, maybe it's something that we will need to change in the rootUsage here.
cc: @kadel

valaparthvi on 16 Apr 2021

There is a suggestion that came up on the channel where I had posted our async problem, which said we could run a long-running process that listens to a socket, instead of a background process. I am not sure how I would go about it, but I figured it might spark an idea for someone here.

Excerpt -

If it wasn't Go you might want to fork(), but since it is go I think your options are limited.
I wouldn't worry too much about users stumbling across the option you use to activate the background process. They can read the code after all. 🙂
More seriously, if you don't want it to be as easily discoverable as myapp --help you could add a top level environment variable that switches the behavior - eg MYAPP_BACKGROUND_MODE_ENABLED=for-internal-use-only
and set the env var only when you're launching the process
also, you could consider something like a long running process listening on a socket instead of launching background processes
I don't fully know your use case tho, so that may be more work that your users need to do (I think your tool is meant to be run directly by humans IIUC)
but maybe you could have them opt into it, like ssh-agent style or systemd socket activation? just some wild ideas

one thing you could consider is to create a separate app that reads the telemetry data from a file, and sends it to the server. The cli app would launch that new app at startup, and then write all that data to the file, instead of sending it directly to the background process.

valaparthvi on 16 Apr 2021

I have one more question though - what do you think would be better, command or flag?

No strong opinion myself. Fine by whatever works best for you folks @kadel @valaparthvi.

dharmit on 16 Apr 2021

Add Cluster type information (k8s or OpenShift)

@dharmit @girishramnani Can I get a pointer on this one?

I don't know the exact way to figure it out but OpenShift supports project resource while vanilla k8s doesn't. We already have a function in odo code to check for project resource.

dharmit on 22 Apr 2021

👍1

I have one more question though - what do you think would be better, command or flag? The problem with a hidden command is, it still shows in the usage doc when odo --help is called for some reason. Since it is a hidden command, I wouldn't expect it to show in the usage doc, maybe it's something that we will need to change in the rootUsage here.
cc: @kadel

definitely command. We need to find a way how to hide it from --help

kadel on 23 Apr 2021

It's done. And fixing cobra command annotation hid it from --help.

valaparthvi on 23 Apr 2021

Scope of the issue for sprint 201 -

[x] Merge #4662
[ ] Add Cluster type to collected data
[ ] Add Cluster ID to collected data
[ ] Add documentation for odo telemetry

valaparthvi on 4 May 2021

Rescoping this issue for sprint 201-

[x] Merge #4662

valaparthvi on 14 May 2021

Reopening. #4662 only fixed a part of this issue.

valaparthvi on 20 May 2021

Scope of the issue for sprint 202 -

[ ] Add Cluster type info with odo push and odo project create/set - Use IsProjectSupported to decide between ocp and vanilla k8s clusters.
[ ] Add ComponentType info with odo push, we already have this added for odo create.
[ ] Add documentation for telemetry. Add usage data information to ReadMe. Add information that we are collecting with each command, and about odo telemetry command.

valaparthvi on 27 May 2021

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Clean up error wrapping

maysunfaisal · 8Comments

odo push all the target directory content and not the binary file

cmoulliard · 8Comments

Odo fails when devfile component container name is really long

maysunfaisal · 9Comments

Document use of "odo url create" for devfile in CLI and docs

dharmit · 8Comments

component is left in broken state after redeployment of the pod

kadel · 7Comments