Origin: Container creation fails because of "Failed create pod sandbox"

Created on 26 Oct 2017 · 35Comments · Source: openshift/origin

Pods are not getting created anymore

Version

oc v3.6.173.0.7
kubernetes v1.6.1+5115d708d7
features: Basic-Auth

Server https://api.starter-ca-central-1.openshift.com:443
openshift v3.7.0-0.143.7
kubernetes v1.7.0+80709908fd

Steps To Reproduce

create a application (e.g. resid (persistent) from catalog)
check pod/container creation
wait for timeouts

Current Result

the only real error I could grab was :
Failed kill pod | error killing pod: failed to "KillPodSandbox" for "c4c2ec61-ba29-11e7-8b2c-02d8407159d1" with KillPodSandboxError: "rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod \"redis-1-deploy_instantsoundbot\" network: CNI request failed with status 400: 'Failed to execute iptables-restore: exit status 4 (Another app is currently holding the xtables lock. Perhaps you want to use the -w option?\n)\n'"

Expected Result

pod should start up as the used to do...

Additional Information

Couldn't get oc adm diagnostics working atm
I guess it could have to do with the introduction of https://github.com/openshift/origin/pull/15880

componenkubernetes kinbug lifecyclrotten prioritP1

Source

Ocimum-basilicum

Most helpful comment

Same problem on starter-ca-central-1.openshift.com

error streaming logs from build pod: sii-test/app-5-build container: , container "sti-build" in pod "app-5-build" is not available

Artod on 30 Oct 2017

👍6

All 35 comments

I'm facing the same in the starter-us-east-1.openshift.com environment. It's been unstable for couple of days already...

tumido on 26 Oct 2017

/cc @jupierce

pweil- on 26 Oct 2017

This is a known issue that has a fix and it being rolled out to the starter clusters presently.

sjenning on 26 Oct 2017

👍2

I think 'm facing the same issue at starter-ca-central-1.openshift.com:

10:04:10 AM | Normal | Deadline exceeded | Pod was active on the node longer than the specified deadline
-- | -- | -- | --
10:00:03 AM | Normal | Sandbox changed | Pod sandbox changed, it will be killed and re-created.14 times in the last 58 minutes
-- | -- | -- | --
9:59:40 AM | Warning | Failed create pod sand box | Failed create pod sandbox.14 times in the last 58 minutes

When will the fix finish rolling out?

skjolber on 29 Oct 2017

Facing same issue not possible to rollout anything on starter-ca-central-1.openshift.com. Hope will be fixed soon.

14yannick on 29 Oct 2017

👍5

i got same issue , i tried to create application using tomcat 8 , and try to build source code that exist in this path
https://github.com/osamahassan245/samplepp

i got build error , when tried to check the log , got this log

container "sti-build" in pod is not available

osamahassan245 on 30 Oct 2017

Same problem on starter-ca-central-1.openshift.com

error streaming logs from build pod: sii-test/app-5-build container: , container "sti-build" in pod "app-5-build" is not available

Artod on 30 Oct 2017

👍6

issue solved , i tried to use " Red Hat JBoss Web Server 3.1 Tomcat 8 1.0 " , it's working fine now

osamahassan245 on 31 Oct 2017

The issue is still actual on starter-us-east-1.openshift.com

edevyatkin on 1 Nov 2017

Still problem on ca-central

izderadicka on 5 Nov 2017

Glad I'm not the only one seeing this issue. It's been occurring for me on console.starter-us-west-1.openshift.com since last weekend (11/4).

brianHollingsworth on 9 Nov 2017

still seeing this on starter-ca-central-1.openshift.com

sothawo on 12 Nov 2017

I have same issue . error streaming logs from build pod: mavajsunco-website/mavajsunco-msc-6-build container: , container "sti-build" in pod "mavajsunco-msc-6-build" is not available

mavajsunco on 19 Nov 2017

Same issue deploying rhscl/mysql-57-rhel7 on starter-us-east-1.

axl8713 on 21 Nov 2017

👍1

@dcbw this is the all too familiar iptables-restore issue. You are closer to this that I am and hopefully can provide better feedback about the progress.

sjenning on 29 Nov 2017

👍

warmchang on 11 Jan 2018

Still having this problem on starter-us-west-2.

I've got 7 failed deployments in a row for this error message.

nevadascout on 17 Jan 2018

👍5 😕3

^same

skoorupa on 4 Feb 2018

@dcbw @sjenning any input as to where the issue might be?

DanyC97 on 21 Feb 2018

Seeing this on pro-us-east-1

jamestenglish on 22 Feb 2018

😕2 👍2

Seeing this the last couple of days on pro-us-east-1 as well

jherson on 22 Feb 2018

😕1 👍1

Same here!!! Observing on pro-us-east-1.

shreyasgombi on 1 Mar 2018

Hey Folks! Any update on this one, do you have a fix already on openshift or openshift ansible repos that I can pick up. Is there a temporary workaround for this issue? We are facing the same issue with our openshift cluster on AWS.

Version
OpenShift Master:
v3.7.0+7ed6862
Kubernetes Master:
v1.7.6+a08f5eeb62

saurabhdevops on 14 Mar 2018

@pweil- , @jupierce are you still looking into this issue. Is there any progress or workaround available?

saurabhdevops on 26 Mar 2018

@dcbw @knobunc ping

pweil- on 26 Mar 2018

I am facing similar issue using OCP v3.9.30 with CDK. In my case I have Che deployed on OpenShift and when I start a new workspace, its node crashes on sandbox changed:

11:52:32 AM     Normal  Killing     Killing container with id docker://container:Need to kill Pod
11:52:30 AM     Normal  Sandbox Changed     Pod sandbox changed, it will be killed and re-created.
11:52:28 AM     Normal  Started     Started container
11:52:28 AM     Normal  Created     Created container

Is there any update on this issue @dcbw?

agajdosi on 12 Jun 2018

I used Openshift for more than 5 years. Spend a lot time making my app running on v2 again. At the end traffic was just not rooted anymore. Mooved to heroku took me 2 hours to migrate all my data(db) and make the necessary Source Code changes. Since then no more Problems. Sorry Openshift

14yannick on 12 Jun 2018

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot on 10 Sep 2018

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot on 11 Oct 2018

Seeing this (or something similar) currently on OpenShift Online starter-us-west-1. Unable to build or deploy because of it. No logs from pods that have this issue. Status page says all green.

alexbuzzbee on 31 Oct 2018

👍2

We still see this issue on okd 3.7.1

jhaohai on 13 Nov 2018

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-bot on 13 Dec 2018

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot on 13 Dec 2018

I am seeing this issue or something simular when deploying 3Scale API Management Platform on Openshift in particular system-sidekiq.

Failed create pod sandboxc: rpc error: code = Unknown desc = failed to set up sandbox container "2cc1e1d064082f2a2b8cd7a10efb7d135a8a150e7d95fb7b939d6368e1717309" network for pod "system-sidekiq-6-deploy- debug": NetworkPlugin cni failed to set up pod "system-sidekiq-6-deploy-debug_mmcneilly-3scale-onprem" network: CNI request failed with status 400: 'pods "system-sidekiq-6-deploy-debug" not found '

Can this issue be reopened?
/reopen