Syndesis: API Provider integration restarts for no reason

Created on 2 Jul 2019  路  27Comments  路  Source: syndesisio/syndesis

This is a...


[ ] Feature request
[x] Regression (a behavior that used to work and stopped working in a new release)
[x] Bug report  
[ ] Documentation issue or request

Description

After deploying a simple API Provider integration (just the todo api with a single implemented flow), it gets restarted every couple of minutes. The provided API itself works fine, except during the restart, when it's completely unavailable.

This is the log when the integration restarts:

2019-07-02 09:51:33.948  INFO 1 --- [       Thread-7] ationConfigEmbeddedWebApplicationContext : Closing org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext@2890c451: startup date [Tue Jul 02 09:49:08 UTC 2019]; root of context hierarchy
2019-07-02 09:51:33.951  INFO 1 --- [       Thread-7] o.s.c.support.DefaultLifecycleProcessor  : Stopping beans in phase 2147483647
2019-07-02 09:51:33.952  INFO 1 --- [       Thread-7] o.a.camel.spring.SpringCamelContext      : Apache Camel 2.21.0.fuse-740028 (CamelContext: todo-empty) is shutting down
2019-07-02 09:51:33.952  INFO 1 --- [       Thread-7] o.a.camel.impl.DefaultShutdownStrategy   : Starting to graceful shutdown 11 routes (timeout 300 seconds)
2019-07-02 09:51:33.966  INFO 1 --- [ - ShutdownTask] o.a.camel.impl.DefaultShutdownStrategy   : Route: i-Lim33GffTi8X3S51NMNz shutdown complete, was consuming from: direct://2a2f0185-3980-4167-af96-55cb4af5196e
2019-07-02 09:51:33.969  INFO 1 --- [ - ShutdownTask] o.a.camel.impl.DefaultShutdownStrategy   : Route: i-Lim33GefTi8X3S51NMKz shutdown complete, was consuming from: direct://d44d3329-31ed-482e-95cd-2415ac248ced
2019-07-02 09:51:33.969  INFO 1 --- [ - ShutdownTask] o.a.camel.impl.DefaultShutdownStrategy   : Route: i-Lim33GefTi8X3S51NMHz shutdown complete, was consuming from: direct://9bf8441f-0d65-4646-a418-71c0dda1c7f3
2019-07-02 09:51:33.969  INFO 1 --- [ - ShutdownTask] o.a.camel.impl.DefaultShutdownStrategy   : Route: i-Lim33GefTi8X3S51NMEz shutdown complete, was consuming from: direct://e6f15c45-b648-441e-b0a8-450d5155820b
2019-07-02 09:51:33.970  INFO 1 --- [ - ShutdownTask] o.a.camel.impl.DefaultShutdownStrategy   : Route: i-Lim33GdfTi8X3S51NMBz shutdown complete, was consuming from: direct://5513c8d7-58ae-4b68-9455-0481d9475c17
2019-07-02 09:51:33.973  INFO 1 --- [ - ShutdownTask] o.a.camel.impl.DefaultShutdownStrategy   : Route: 2a2f0185-3980-4167-af96-55cb4af5196e shutdown complete, was consuming from: servlet:/api/%7Bid%7D?headerFilterStrategy=syndesisHeaderStrategy&httpMethodRestrict=DELETE
2019-07-02 09:51:33.973  INFO 1 --- [ - ShutdownTask] o.a.camel.impl.DefaultShutdownStrategy   : Route: d44d3329-31ed-482e-95cd-2415ac248ced shutdown complete, was consuming from: servlet:/api/%7Bid%7D?headerFilterStrategy=syndesisHeaderStrategy&httpMethodRestrict=PUT
2019-07-02 09:51:33.973  INFO 1 --- [ - ShutdownTask] o.a.camel.impl.DefaultShutdownStrategy   : Route: 9bf8441f-0d65-4646-a418-71c0dda1c7f3 shutdown complete, was consuming from: servlet:/api/%7Bid%7D?headerFilterStrategy=syndesisHeaderStrategy&httpMethodRestrict=GET
2019-07-02 09:51:33.973  INFO 1 --- [ - ShutdownTask] o.a.camel.impl.DefaultShutdownStrategy   : Route: e6f15c45-b648-441e-b0a8-450d5155820b shutdown complete, was consuming from: servlet:/api?headerFilterStrategy=syndesisHeaderStrategy&httpMethodRestrict=POST
2019-07-02 09:51:33.976  INFO 1 --- [ - ShutdownTask] o.a.camel.impl.DefaultShutdownStrategy   : Route: 5513c8d7-58ae-4b68-9455-0481d9475c17 shutdown complete, was consuming from: servlet:/api?headerFilterStrategy=syndesisHeaderStrategy&httpMethodRestrict=GET
2019-07-02 09:51:33.976  INFO 1 --- [ - ShutdownTask] o.a.camel.impl.DefaultShutdownStrategy   : Route: route1 shutdown complete, was consuming from: servlet:/openapi.json?headerFilterStrategy=syndesisHeaderStrategy&httpMethodRestrict=GET
2019-07-02 09:51:33.977  INFO 1 --- [       Thread-7] o.a.camel.impl.DefaultShutdownStrategy   : Graceful shutdown of 11 routes completed in 0 seconds
2019-07-02 09:51:34.003 DEBUG 1 --- [       Thread-7] i.s.i.c.proxy.ComponentProxyComponent    : Stopping connector: direct-4-0
2019-07-02 09:51:34.004 DEBUG 1 --- [       Thread-7] i.s.i.c.proxy.ComponentProxyComponent    : Stopping connector: direct-3-0
2019-07-02 09:51:34.004 DEBUG 1 --- [       Thread-7] i.s.i.c.proxy.ComponentProxyComponent    : Stopping connector: direct-2-0
2019-07-02 09:51:34.004 DEBUG 1 --- [       Thread-7] i.s.i.c.proxy.ComponentProxyComponent    : Stopping connector: direct-1-0
2019-07-02 09:51:34.004 DEBUG 1 --- [       Thread-7] i.s.i.c.proxy.ComponentProxyComponent    : Stopping component: sql-sql-0-1
2019-07-02 09:51:34.004 DEBUG 1 --- [       Thread-7] i.s.i.c.proxy.ComponentProxyComponent    : Stopping connector: sql-0-1
2019-07-02 09:51:34.004 DEBUG 1 --- [       Thread-7] i.s.i.c.proxy.ComponentProxyComponent    : Stopping connector: bean-3-1
2019-07-02 09:51:34.004 DEBUG 1 --- [       Thread-7] i.s.i.c.proxy.ComponentProxyComponent    : Stopping connector: direct-0-0
2019-07-02 09:51:34.004 DEBUG 1 --- [       Thread-7] i.s.i.c.proxy.ComponentProxyComponent    : Stopping connector: bean-4-1
2019-07-02 09:51:34.004 DEBUG 1 --- [       Thread-7] i.s.i.c.proxy.ComponentProxyComponent    : Stopping connector: bean-1-1
2019-07-02 09:51:34.004 DEBUG 1 --- [       Thread-7] i.s.i.c.proxy.ComponentProxyComponent    : Stopping connector: bean-2-1
2019-07-02 09:51:34.004 DEBUG 1 --- [       Thread-7] i.s.i.c.proxy.ComponentProxyComponent    : Stopping connector: bean-0-3
2019-07-02 09:51:34.021  INFO 1 --- [       Thread-7] o.a.camel.spring.SpringCamelContext      : Apache Camel 2.21.0.fuse-740028 (CamelContext: todo-empty) uptime 2 minutes
2019-07-02 09:51:34.021  INFO 1 --- [       Thread-7] o.a.camel.spring.SpringCamelContext      : Apache Camel 2.21.0.fuse-740028 (CamelContext: todo-empty) is shutdown in 0.069 seconds
2019-07-02 09:51:34.021  INFO 1 --- [       Thread-7] o.s.c.support.DefaultLifecycleProcessor  : Stopping beans in phase 0
2019-07-02 09:51:34.026  INFO 1 --- [       Thread-7] o.s.j.e.a.AnnotationMBeanExporter        : Unregistering JMX-exposed beans on shutdown
2019-07-02 09:51:34.033  INFO 1 --- [       Thread-7] o.a.c.c.s.CamelHttpTransportServlet      : Destroyed CamelHttpTransportServlet[CamelServlet]
cabug closeverified prip0 sourcqe

All 27 comments

What syndesis version are you using?

@christophd Could you take a look at this?

@heiko-braun this is on 1.7.8

Also after a while, it is completely redeployed (and the redeployed integration is again restarted regularly)

@asmigala is that on staging or on your local machine?

@christophd locally on minishift

@asmigala can you see some failing events on the OpenShift deployment? Maybe the POD just gets OOM killed

@christophd a different integration on the same minishift survived the whole time without restarts. I'll try again and watch for OOM

I will try to reproduce on my machine also

image

I need to find out why the Liveness probe is failing. This marks the POD as unhealthy and it gets restarted

I think this is related to https://github.com/syndesisio/syndesis/issues/5328 where the health check liveness probe has been introduced.

Not sure if this is working combination with api provider as we might need to call another health check endpoint (not the usual Spring Boot health check endpoint) in that case.

So here is what I found out:

APIProvider integrations add a servlet mapping Mapping servlet: 'CamelServlet' to [/*] and this overwrites the Spring Boot Dispatcher servlet mapping Mapping servlet: 'dispatcherServlet' to [/] so requests to /health are not handled by Spring Boot anymore.

This leads to the 404 not found when liveness probe requests the health checks for API provider integrations.

Webhook integrations for instance add some other servlet mapping Mapping servlet: 'CamelServlet' to [/webhook/*] so this is working because /health requests are still handled by Spring Boot.

I see 3 options to fix it with API provider integrations:

  1. Use different servlet mapping in API provider (e.g. Mapping servlet: 'CamelServlet' to [/apiprovider/*])
  2. Add Camel REST route in API provider (similar to the one serving the openapi specs) to handle /health requests and delegate to Spring Boot health checks then
  3. Call some other health check (e.g. the ones provided with camel-core) in liveness probe for all integrations

I personally would go with option 2 as it is a less intrusive solution in my eyes.

@zregvart @lburgazzoli @alexkieling what's your call on this?

don't we have a different port for managements such as health ?

I personally would go with option 2 as it is a less intrusive solution in my eyes.

What happens when the OpenAPI specifies an operation at the /health path?

@asmigala good point!

turns out we have an option #4 (from @lburgazzoli) that uses a different Spring Boot management port in application.properties management.port=8081 let's see if this works

It might be better to do one of the following things:

  • Change the Spring Boot management path from "/" to "/actuator" (Spring Boot 2 uses "/actuator")
  • Change the Spring Boot management port, e.g., 8081

Docs: https://docs.spring.io/spring-boot/docs/1.5.x/reference/html/production-ready-monitoring.html

Can we roll back https://github.com/syndesisio/syndesis/pull/5646 on 1.7.x? This would allow us to probperly fix it on master without the risk to introduce further side effects.

Using a different management port is working, but we would have to expose another service (and maybe a route, too) in OpenShift for each integration deployment in order to serve that new container port.

I am not very familiar with OpenShift internals but from my first tests I think liveness probe checks need that service and route to access the management container port.

@heiko-braun when this is too much of a change at this point we rather might go the revert path

@christophd Try changing the management path instead of the management port. It avoids having to define a new port.

@alexkieling yeah have tried that, too but it is not working as the CamelServlet is mapped to /* and this makes everything on port 8080 go to this servlet instead of the dispatcher servlet for Spring Boot

You do not need to add a new service afaik, checks are performed by the pod agains the container, what鈥檚 needed is to change the probes to point to a different port

https://github.com/syndesisio/syndesis/blob/master/app/server/openshift/src/main/java/io/syndesis/server/openshift/OpenShiftServiceImpl.java#L276

maybe such options should be configurable

@lburgazzoli could be right there because I do not see the POD being restarted frequently anymore. I saw some liveness probe errors though shortly after the POD starts but this could also be my local machine being too slow to get the Spring Boot application running in time.

I have an integration running for about 2h now and no restart at all. So I think liveness probe is working without the service.

The liveness probe initialDelaySeconds property is configurable and currently set to 120 seconds.

@heiko-braun without the need for a new service the changes for the fix are quite small (introducing a new management server port and using this in the liveness probe). So I guess we can go for the fix on master and backport to 1.7.x.

I do not think we have ever been using the Spring Boot health checks on an integration before so changing the management port will not break existing mechanisms I guess.

Verified with 1.7.12

Damn you @pure-bot.

Was this page helpful?
0 / 5 - 0 ratings