Envoy: Envoy crashes if Ext Autz Filter is configured with wrong Cluster name

Created on 13 Mar 2018  路  10Comments  路  Source: envoyproxy/envoy

If we enable Ext Autz filter with a wrong cluster name, Envoy crashes during request process with the following exception


libc++abi.dylib: terminating with uncaught exception of type Envoy::EnvoyException: Unknown gRPC client cluster 'ext_autz_cluster'
[2018-03-13 13:05:21.874][3347742][critical][backtrace] bazel-out/darwin-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] Caught Abort trap: 6, suspect faulting address 0x7fff73ec2e3e
[2018-03-13 13:05:21.876][3347742][critical][backtrace] bazel-out/darwin-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:87] Backtrace obj<envoy-static> thr<123145484828672>:
[2018-03-13 13:05:21.876][3347742][critical][backtrace] bazel-out/darwin-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:99] thr<123145484828672> obj<envoy-static                        0x000000010349ede6 _ZN8backward7details6unwindINS_14StackTraceImplINS_10system_tag10darwin_tagEE8callbackEEEmT_m>
[2018-03-13 13:05:21.876][3347742][critical][backtrace] bazel-out/darwin-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:105] thr<123145484828672> #0 0x10349ede6:
[2018-03-13 13:05:21.876][3347742][critical][backtrace] bazel-out/darwin-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:99] thr<123145484828672> obj<envoy-static>
[2018-03-13 13:05:21.876][3347742][critical][backtrace] bazel-out/darwin-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:105] thr<123145484828672> #1 0x10349e975: backward::StackTraceImpl<backward::system_tag::darwin_tag>::load_here(unsigned long) + 101
[2018-03-13 13:05:21.877][3347742][critical][backtrace] bazel-out/darwin-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:105] thr<123145484828672> #2 0x10349e771: backward::StackTraceImpl<backward::system_tag::darwin_tag>::load_from(void*, unsigned long) + 49
[2018-03-13 13:05:21.877][3347742][critical][backtrace] bazel-out/darwin-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:105] thr<123145484828672> #3 0x10349cffe: Envoy::BackwardsTrace::captureFrom(void*) + 46
[2018-03-13 13:05:21.877][3347742][critical][backtrace] bazel-out/darwin-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:105] thr<123145484828672> #4 0x10349cedf: Envoy::SignalAction::sigHandler(int, __siginfo*, void*) + 143
[2018-03-13 13:05:21.877][3347742][critical][backtrace] bazel-out/darwin-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:110] end backtrace thread 123145484828672

While this exception is valid that the cluster is not a known cluster, it would be good if we do this validation during server startup.
Also if this filter is enabled with listener served via LDS/ADS, we should validate the cluster is available and then only update the listener otherwise listener update should fail.

bug

All 10 comments

@saumoh can you take a look at fixing?

@mattklein123 @saumoh could we divide this into two tickets?

  1. Static config case, by checking if filter uses envoy_grpc then proto_config.grpc_service().envoy_grpc.cluster_name() is inside the current bootstrapped context.clusterManager().clusters().
  2. Dynamic config case.

If it's OK, I could probably help working on the first one, while still investigating on the second one. Thank you.

@dio yes we can split. The dynamic case is being discussed in https://github.com/envoyproxy/envoy/issues/2762. I would check to see if @saumoh is fixing or not. The static case should be trivial to fix.

@dio and @mattklein123 I was thinking of addressing item 2 directly in which case we won't need to add a check for 1. wdyt?
Will get to it this week.

@saumoh cool!

@saumoh SGTM, thanks.

@mattklein123 @htuch
To make the authz filter use a dynamically learned gRPC cluster how about I implement the following:

  • Inherit from AsyncClientFactoryImpl. The inherited class will allow the authz filter to accept the cluster_name from v2api.
  • In the filter code the client_ may be nullptr and we can use the failure_mode_allow_(https://github.com/envoyproxy/envoy/blob/9271200ee1799c5cddf9785c5a63d0634cfcbf02/source/common/http/filter/ext_authz.h#L53) option to decide if the filter should permit|deny the request. There is already a counter for CheckStatus::Error that will click when we hit such a condition and that should provide the required debugging. See here.
    We already have code which takes care of case when the gRPC cluster is inaccessible for filter that's already setup which also uses the CheckStatus::Error status.

Does this sound reasonable approach to you. Any condition that I am not thinking about?

@saumoh the issue here is that if an AsyncClient is stored, you will still run into issues. I think this is what you need to do:

  • have factoryForGrpcService() take a boolean as to whether the client should require the cluster to exist and be a primary cluster (effectively).
  • Make AsyncClientImpl::send and AsyncClientImpl::start safe if the cluster does not exist. In these cases it should return nullptr and not crash.

Then I think you can continue on with the rest of what you propose which sounds fine.

@mattklein123 Makes sense.
For your point 2. In AsyncClientImpl I'll catch the exception thrown here so that the code is safe.
Will start implementing it.

This is fixed.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

karthequian picture karthequian  路  3Comments

justConfused picture justConfused  路  3Comments

jeremybaumont picture jeremybaumont  路  3Comments

ghost picture ghost  路  3Comments

sabiurr picture sabiurr  路  3Comments