Dump/tap is an area that we haven't gone into yet in Envoy but potentially has a huge amount of value. Opening this issue to gather a feature set that would be generally interesting. Some things to think about:
What else?
This is a cool idea. For both file and disk dumping/tapping I'd suggest that an important feature is encryption with credentials obtained by the API.
One useful application of this is trace+replay for performance work. The idea is you record in a format that can later be used to recreate requests, allowing representative real world workloads to be captured and analyzed offline.
I'd love to see dev-time introspection for client engineers on par with what ngrok used to offer. We currently use Charles for that (which is good for debugging un-cooperating endpoints), but it'll be nice if it was an integral part of the backend.
If dumping can be easily segregated by flows (HTTP2 or TCP), that would be a huge win. This doesn't have to exist inside Envoy's dump logic its self, but tooling around it can make it significantly more usable.
I regularly use https://github.com/simsong/tcpflow in some tricky situations, for an example of a de-multiplexer.
Also dumping internal telemetry along with the data (i.e. stats recorded, status of circuit breakers, retries etc) in a framed format would be an interesting addition.
This would be a great feature. While we don't have specific tooling today, it would be useful to have this tap capability in place to selectively route to security monitoring tools. We wouldn't want full contents, but being able to select specific headers would give us great visibility.
Very clear and unsurprising handling of back-pressure is important for features like this. If mirroring of traffic is not possible or violates data-plane QoS, it should be very obvious that Envoy has fallen back to sampling or dropping packets on the floor.
Agree with @ikonst that this could be great for client debugging, and it would be nice to remove intermediary tools. One challenge with the clients is that there's a few hoops to jump through to point them at a specific instance. If we could enable this via headers (and handle collection via something like dump-to-network), that overhead might be avoided. There are some security concerns (e.g. DDOS potential), but perhaps that can be addressed with per-instance ratelimiting/circuit breaking.
Are we going to dump to pcap here btw?
I haven't thought about output formats yet. I don't think we can dump explicitly to pcap since we don't have full packet data. I need to investigate different formats to see what would be the best option.
About a year after opening this, I'm ready to get going. I wrote a short design doc. Please comment! https://docs.google.com/document/d/1fgVAH8BMrq_5dt8m54Rxp0OGXt6WlNUsAW0xGlBOkJs/edit#
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.
Still planning on working on this.
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.
Still planning on working on this. Got sidetracked but sometime in the next few months.
With https://github.com/envoyproxy/envoy/pull/6105 I'm calling this high level issue done. I'm going to be opening a series of small issues for work that can be parallelized.
Most helpful comment
I'd love to see dev-time introspection for client engineers on par with what ngrok used to offer. We currently use Charles for that (which is good for debugging un-cooperating endpoints), but it'll be nice if it was an integral part of the backend.