/kind feature
/priority high
For first pass -
odo telemetry command in dev docs.For later -
Collect information such as -
This looks like a good start.
Just one note. Don't forget about correctly handling anonymousId (using existing or generating new if needed) as described in Telemetry Guidence document.
You can check how CRC is doing it https://github.com/code-ready/crc/blob/194965b96b3c2f8ed534da1b8bfaeb2e43790612/pkg/crc/segment/segment.go#L117-L135
Ditto, looks like good coverage. It's implied but the timing/error is really logging every command run with success/failure and time.
I don't recall if odo failure messages sometimes include user info like id or paths. If not then you can just send those (or you can send reason codes like "ran out of space" or "network error") and avoid having to sanitize.
- [ ] User preferences(?)
- [ ] Proxy enabled(?)
I would say that we don't need this at the beginning. Let's keep the initial telemetry simple and small. We can always add more later.
Of course. I added them as a placeholder so that I don't forget to discuss them.
alias AC=Acceptance Criteria
Scope of this issue for Sprint 198 -
segmentClient.Upload is incorrect, it will be fixed once I figure where to hook the function which I think is somewhere in LogErrorAndExit. Currently, it only sends commands that ran successfully or did not break short because of some error./reopen
only the first item was done
@kadel: Reopened this issue.
In response to this:
/reopen
only the first item was done
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Scope of this issue for Sprint 199 -
Add Cluster type information (k8s or OpenShift)
I'm interested in this information since I want to understand if how wilde OCP 3.11 is used.
Other than that, I'm also interested in knowing if users use Service Catalog. I'm not sure how/if this information can be fetched. One way could be by looking at the odo service create commands being executed. Other way could be to see if the cluster in question has API objects mentioned here viz. ClusterServiceClass and ClusterServiceBroker.
WDYT @kadel @deboer-tim ?
Add Cluster type information (k8s or OpenShift)
@dharmit @girishramnani Can I get a pointer on this one?
Add Cluster type information (k8s or OpenShift)
I'm interested in this information since I want to understand if how wilde OCP 3.11 is used.
Other than that, I'm also interested in knowing if users use Service Catalog. I'm not sure how/if this information can be fetched. One way could be by looking at the
odo service createcommands being executed. Other way could be to see if the cluster in question has API objects mentioned here viz.ClusterServiceClassandClusterServiceBroker.WDYT @kadel @deboer-tim ?
I've said it before, but once we have basic telemetry for each command we should go ahead and 'ship it'. There are some obvious other things we can add, but this is incremental and we'll think of others later/as we review telemetry. The top things I think I'd want to know are:
Consenting to Telemetry makes odo significantly slower. Currently, 2 messages are sent to Segment, 1) Identify - that contains system information and, 2) Track - that contains information about the executed command.
To mend this @kadel and I discussed a few solutions, which I am listing here -
1) Tomas suggested that the reason this is slow is that it takes time to open the client and close it, the amount of data sent does not affect the speed much. So we can send some data before the command runs, i.e. once the command is validated, we send the Track message and then let odo do its thing. Since sending the message is asynchronous, it will not affect odo's execution, and there won't be any delay that we see at the end. This also means that we will not send information about the success of a command. But in case the command fails, we will send a message about it, in which case, there will be some delay at the end.
2) Run a daemon process that sends the data to segment even after the actual odo execution has finished.
I think the first solution seems to the optimum one. I am going to test it and see if it helps.
There are 2 things that we will need to change regardless of the solution we go with -
1) Set BatchSize to 1 while instantiating a new segment Client so that it sends a message as soon it is queued.
2) It does not make sense to send Identify message with every command since this information is included in the Track message. So we will only send the Identify message when a new anonymousID is created.
Tomas, I feel like I am missing one other thing that we discussed, but I hope not.
@kadel The first implementation mentioned here https://github.com/openshift/odo/issues/4462#issuecomment-810152058 when implemented with _BatchSize=1 and Interval=1Millisecond_ seems to be working fine with long running commands such odo create or odo push, but it still has the same delay with fast commands such as odo preference view or odo config view. IMO it's perhaps still better than what it is right now.
Perhaps we can skip uploading the data for a command that does _view_ or _list_ operations or do you think that data is important?
2. Run a daemon process that sends the data to segment even after the actual odo execution has finished.
Thinking out loud - can we fork a new process that sends the information after odo is done executing its stuff?
Thinking out loud - can we fork a new process that sends the information after odo is done executing its stuff?
Yeah, that is what Tomas suggested.
Thinking out loud - can we fork a new process that sends the information after odo is done executing its stuff?
Yeah, that is what Tomas suggested.
Ah, OK. My bad. When I saw the word "daemon", I assumed something like a client-server architecture.
Other teams spawn background threads for telemetry so that it doesn't block the foreground action. I agree that's not really feasible for cli tools because you don't want to be done the operation and then not get control back because it is still waiting for the message to be sent.
Spawning a secondary process to send each message might work, but that sounds pretty heavyweight. Is it hard to cache the messages to disk? I was wondering if quicker commands could save the event and then we flush these together when a longer command like push() is used. I'm not sure if this is possible or whether it would affect the timestamps, but we don't need every message immediately.
I just realized something.
We need to carefully consider how we structure events.
Currently, we use the odo command name (like odo create or odo push) as an event/action identifier. If I understand how Woopra structures data correctly, each Action can have its own schema. In my mind, I see Actions as database tables. And each time you send and event with the same Action identifier you create a new row in that table. If we have a separate Action type for each it means that we can track different properties for each command. For example for odo create we can have a property called devfileName or odo url create can have urlType: <route/ingress> property.
- Tomas suggested that the reason this is slow is that it takes time to open the client and close it, the amount of data sent does not affect the speed much. So we can send some data before the command runs, i.e. once the command is validated, we send the
Trackmessage and then let odo do its thing. Since sending the message is asynchronous, it will not affect odo's execution, and there won't be any delay that we see at the end. This also means that we will not send information about the success of a command. But in case the command fails, we will send a message about it, in which case, there will be some delay at the end.
The approach that we discussed with @valaparthvi requires using one event/action type for all commands. The idea was that odo will send cmd.run event right before it does the action. And if the command failed it will send an additional event cmd.error. This will allow us to track failed commands.
But now I'm afraid that this will make it harder to track separate properties for each command, as they all will be of the same type.
All this is based on my still quite limited Woopra understanding. It might be good if someone with more Woopra experience can help us properly structure the data. Correct structure is going to be crucial to make data useful.
So, what if we let all the other actions be as they are, but upload an action called cmd.error in case of an error?
- Stop wrong data(i.e. test and CI data) from being sent to Segment.
Scope of this issue for Sprint 199 -
* [x] Merge #4482 * [ ] Add Cluster type information (k8s or OpenShift) * [ ] Sanitize the data coming into Segment * [ ] Figure what devfile are users using - java, nodejs, quarkus, maven? * [ ] Analyze what more data needs to go in and add them. * [ ] Make segment upload asynchronous.
Rescoping the issue this sprint -
Requoting from https://github.com/openshift/odo/pull/4565#issuecomment-814029219
On second thought, sending 2 different events won't have much of an impact.
In most cases, we will only send one message to Segment, and if BatchSize is 1, it will take the same time as it is taking now. I am going to revert the changes now. I think the one option we have right now is doing something with the process.Having BatchSize set to 1 will help when we have to send 2 messages, for e.g. Identify message and Track message. At the moment, I don't see much use in having a separate event for failure.
/reopen
We have decided to give a shot to the second idea mentioned here https://github.com/openshift/odo/issues/4462#issuecomment-810152058, i.e. Creating a background process that will send the data to Segment after odo finishes execution.
I had a discussion regarding this with @kadel and based on his suggestions, this is how I am planning to approach this -
telemetry that will accept the command and a few other things as an argument. It would look something like this - odo telemetry --action "odo preference view" --duration 5 --errorString "failed to find preference.yaml file" --errorType "errors.NotFound". The Run method will initiate the segmentClient, call its upload method, and close the connection. _This command will be hidden so that people cannot call it._odo telemetry command. This process is only started if the user consented to telemetry. Before the main odo command exits, errorString, errorType, and duration will be calculated, then the telemetry process will be run with all the necessary args, and the main odo command will exit.Tomas, let me know if something is amiss.
- Create a new odo command for
telemetrythat will accept the command and a few other things as an argument. It would look something like this -odo telemetry --action "odo preference view" --duration 5 --errorString "failed to find preference.yaml file" --errorType "errors.NotFound". TheRunmethod will initiate the segmentClient, call its upload method, and close the connection. _This command will be hidden so that people cannot call it._
I would maybe consider passing data to stdin as json instead of using flags. This makes future extensions easier.
We can just marshall the analytics.Track that gets currently passed to c.SegmentClient.Enqueue to JSON and instead of calling Enqueue we would execute odo telemetry and send this JSON to its stdin
{
"userId": "829bf3e1-89b6-40ee-b096-e196e919bde2",
"event": "odo list",
"timestamp": "0001-01-01T00:00:00Z",
"properties": {
"duration(ms)": 18,
"error": "Get \"https://kubernetes.docker.internal:6443/apis/apps.openshift.io/v1?timeout=32s\": dial tcp 127.0.0.1:6443: connect: connection refused",
"error-type": "*net.OpError",
"success": false,
"tty": true,
"version": "odo v2.0.8 (791738b24)"
}
odo telemetry command will just marshall received JSON to analytics.Track struct and pass it to Enequeue.
This approach will leave a lot of room for future extensions, like for example passing multiple Track structs to a single odo telemetry instance.
Is the odo telemetry command supposed to be executed by a user? If no, does it make sense to expose it as a command?
Is the odo telemetry command supposed to be executed by a user? If no, does it make sense to expose it as a command?
It's not, that is why it is going to be a hidden command.
Then why are we adding it as a separate command and not as a function that gets called as and when needed. If it's needed for most (all?) commands then we can even add it to the Runnable interface - https://github.com/openshift/odo/issues/4313.
Then why are we adding it as a separate command and not as a function that gets called as and when needed. If it's needed for most (all?) commands then we can even add it to the
Runnableinterface - #4313.
I don't understand what you mean.
We need to send telemetry in a separate background process. As far as I know, you can't just call function and make it a separate process.
Scope of the issue for Sprint 200 -
I don't understand what you mean.
My thought was to fork/create a new process for the telemetry stuff in such a way that the logic/code that needs to be executed is run totally independent and in the background. By "independent" I mean that it's not dependent on odo binary in any way except that at the end of running an odo command (odo push, odo url create, etc.) odo would trigger this code to run in a separate process on the OS. The odo command that triggered this would not wait for telemetry logic to execute. Nor would it care if it succeeds or fails. By "background" I mean that the logic/code executed in this way would not interrupt the user in any way irrespective of success/failure in executing it.
This also means that we have to do our part in making sure that it's reliable enough because once it's out there, we don't have a way to debug if telemetry is running OK on the users' systems. Note that this needs to be tested on all the platforms - Linux, Windows & macOS.
Adding a method to the Runnable interface was the implementation level detail, and it can be discussed.
The thought process is along the lines of a recent blog I read about "Closing web browser windows doesn't close connections". I don't know if it's technically possible to do this for our use case.
Also, if we really go down this route, it is of utmost importance to mention this clearly in our docs and discuss the implementation detail there.
Adding a method to the Runnable interface was the implementation level detail, and it can be discussed.
I don't understand, how is this going to help us with the problem at hand?
I don't understand, how is this going to help us with the problem at hand?
It's not directly related. It could be one way to implement things. But there can be another way.
Anyway, I had a chat with @kadel and it seems like I was interpreted as being trying to force to use the approach that I thought of.
As I said in my previous comment - I don't know if the idea I shared was technically possible for our use case.
I reiterate that I don't mean that the idea I had is the one we should go with. I shared it, people liked it on the cabal, I was asked to elaborate it on GH and so I did. But it's fine by me if adding odo telemetry is the best way right now and we don't want to spend cycles on the idea I shared.
The way I see it is if I suggested an approach, I should also be able to show how it can be done. But I already said I don't know.
@valaparthvi @kadel I'm sorry I came across as being hell-bent on following the approach I suggested. That was never the intention.
Making telemetry asynchronous to odo tldr;
The data collection and the uploading process start right before the command logic is run. The library itself sends data asynchronously, so the program can continue its normal execution, while the library does its thing. But, when the program is done running the command logic first, and the library is still in the process of sending the data, the program waits for the other process to finishes and we notice a significant delay before the command exits. To fix this delay, we decided to run a process that would upload the data right after the command exits and this would happen in the background so that it does not interrupt the user.
We discovered os.StartProcess().Release() that will help achieve just that for Unix and something similar for windows. But as per my understanding, it also requires that the first argument to os.StartProcess() be a program. To fix this, we thought of adding a new hidden command/flag telemetry to odo that would take data as an argument and upload it, but this would also mean there is a chance that the user might discover this command and try to use it, which we want to avoid.
So we are looking for alternatives to trigger a process or a function call that would upload the data after the main command/program finishes execution.
odo telemetry is our backup solution and I am going to work on it for now.
/reopen
@valaparthvi maybe, Fixes part of #issue number would not automatically close the issue.
@dharmit: Reopened this issue.
In response to this:
/reopen
@valaparthvi maybe,
Fixes part of #issue numberwould not automatically close the issue.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@dharmit I did add partly to my Fixes #issue part of the PR, but it doesn't work, it almost always closes the issue. I guess I'll try with maybe next time.
And about your comment https://github.com/openshift/odo/issues/4462#issuecomment-820351517, of course, I didn't think you were being hell-bent or forceful, I'm sorry if I implied that in some way.
I understand what you are suggesting, but the only way to fork a process or create a new one is to use a binary/executable program. You don't want it to be dependent on odo binary, in which case, we can create a separate library to upload data to segment and use it's binary instead, but that seems like a bit of a stretch.
I have one more question though - what do you think would be better, command or flag? The problem with a hidden command is, it still shows in the usage doc when odo --help is called for some reason. Since it is a hidden command, I wouldn't expect it to show in the usage doc, maybe it's something that we will need to change in the rootUsage here.
cc: @kadel
There is a suggestion that came up on the channel where I had posted our async problem, which said we could run a long-running process that listens to a socket, instead of a background process. I am not sure how I would go about it, but I figured it might spark an idea for someone here.
Excerpt -
If it wasn't Go you might want to fork(), but since it is go I think your options are limited.
I wouldn't worry too much about users stumbling across the option you use to activate the background process. They can read the code after all. 馃檪
More seriously, if you don't want it to be as easily discoverable as myapp --help you could add a top level environment variable that switches the behavior - eg MYAPP_BACKGROUND_MODE_ENABLED=for-internal-use-only
and set the env var only when you're launching the process
also, you could consider something like a long running process listening on a socket instead of launching background processes
I don't fully know your use case tho, so that may be more work that your users need to do (I think your tool is meant to be run directly by humans IIUC)
but maybe you could have them opt into it, like ssh-agent style or systemd socket activation? just some wild ideas
one thing you could consider is to create a separate app that reads the telemetry data from a file, and sends it to the server. The cli app would launch that new app at startup, and then write all that data to the file, instead of sending it directly to the background process.
I have one more question though - what do you think would be better, command or flag?
No strong opinion myself. Fine by whatever works best for you folks @kadel @valaparthvi.
Add Cluster type information (k8s or OpenShift)
@dharmit @girishramnani Can I get a pointer on this one?
I don't know the exact way to figure it out but OpenShift supports project resource while vanilla k8s doesn't. We already have a function in odo code to check for project resource.
I have one more question though - what do you think would be better, command or flag? The problem with a hidden command is, it still shows in the usage doc when
odo --helpis called for some reason. Since it is a hidden command, I wouldn't expect it to show in the usage doc, maybe it's something that we will need to change in therootUsagehere.
cc: @kadel
definitely command. We need to find a way how to hide it from --help
It's done. And fixing cobra command annotation hid it from --help.
Scope of the issue for sprint 201 -
Rescoping this issue for sprint 201-
Reopening. #4662 only fixed a part of this issue.
Scope of the issue for sprint 202 -
odo push and odo project create/set - Use IsProjectSupported to decide between ocp and vanilla k8s clusters.odo push, we already have this added for odo create.odo telemetry command.
Most helpful comment
This looks like a good start.
Just one note. Don't forget about correctly handling
anonymousId(using existing or generating new if needed) as described in Telemetry Guidence document.You can check how CRC is doing it https://github.com/code-ready/crc/blob/194965b96b3c2f8ed534da1b8bfaeb2e43790612/pkg/crc/segment/segment.go#L117-L135