Postgres: Error 135 in Initdb deploying inside Kubernetes

Created on 29 May 2018  路  11Comments  路  Source: docker-library/postgres

When I attempt to run the Postgres container using Kubernetes I get an error and the container crashes. I have been banging my head for a few days on this but can't find anything that points me in the right direction as to what to debug. I have tried using the Postgres container with Docker using the same host and this works fine. I have also tested on a different cluster using Kubernetes cluster and it is working fine so I believe it is something environment specific.

When I set the container to not enter into the entrypoint I can then recreate the initdb error.

Here is the output I get when I run initdb:

postgres@postgresql-844495667c-fdtzw:/$ /usr/lib/postgresql/10/bin/initdb -d -n /db
Running in debug mode.
Running in no-clean mode. Mistakes will not be cleaned up.
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

VERSION=10.4 (Debian 10.4-2.pgdg90+1)
PGDATA=/db
share_path=/usr/share/postgresql/10
PGPATH=/usr/lib/postgresql/10/bin
POSTGRES_SUPERUSERNAME=postgres
POSTGRES_BKI=/usr/share/postgresql/10/postgres.bki
POSTGRES_DESCR=/usr/share/postgresql/10/postgres.description
POSTGRES_SHDESCR=/usr/share/postgresql/10/postgres.shdescription
POSTGRESQL_CONF_SAMPLE=/usr/share/postgresql/10/postgresql.conf.sample
PG_HBA_SAMPLE=/usr/share/postgresql/10/pg_hba.conf.sample
PG_IDENT_SAMPLE=/usr/share/postgresql/10/pg_ident.conf.sample
The database cluster will be initialized with locale "C".
The default database encoding has accordingly been set to "SQL_ASCII".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /db ... ok
creating subdirectories ... ok
selecting default max_connections ... 20
selecting default shared_buffers ... 400kB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... 2018-05-29 13:59:16.693 UTC [249] DEBUG: invoking IpcMemoryCreate(size=3055616)
Bus error (core dumped)
child process exited with exit code 135
initdb: data directory "/db" not removed at user's request

Things I have tried:
Increasing SHM on the host and the container
Running as privileged
Running older versions of Postgres
Running STRACE to see if anything jumped out at me
Increasing CPU limits and requests

Even if someone can point me in the right direction I would be forever grateful. Right now I am stuck banging my head against a wall.

postgresdump.zip

question

Most helpful comment

I believe I hit the same issue (postgres works through docker run, but not k8s). The issue I hit was that huge pages were enabled, but they were not working through k8s, and Postgres wouldn't fall back properly to not using huge pages. I think there are several possible solutions to the problem:

  1. Modify the docker image to be able to set huge_pages = off in /usr/share/postgresql/postgresql.conf.sample before initdb was ran (this is what I did).
  2. Turn off huge page support on the system (vm.nr_hugepages = 0 in /etc/sysctl.conf).
  3. Fix Postgres's fallback mechanism when huge_pages = try is set (the default).
  4. Modify the k8s manifest to enable huge page support (https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/).
  5. Modify k8s to show that huge pages are not supported on the system, when they are not enabled for a specific container.

All 11 comments

Not sure if this helps, I tried a Centos7-postgres image and got a similiar error though with a bit more details.:

fixing permissions on existing directory /var/lib/pgsql/data/userdata ... ok
creating subdirectories ... ok
sh: line 1: 24 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=100 -c shared_buffers=1000 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 26 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=50 -c shared_buffers=500 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 28 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=40 -c shared_buffers=400 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 30 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=30 -c shared_buffers=300 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 32 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=200 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 34 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=100 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
selecting default max_connections ... 10
sh: line 1: 36 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=16384 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 38 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=8192 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 40 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=4096 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 42 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=3584 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 44 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=3072 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 46 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=2560 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 48 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=2048 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 50 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=1536 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 52 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=1000 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 54 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=900 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 56 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=800 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 58 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=700 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 60 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=600 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 62 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=500 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 64 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=400 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 66 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=300 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 68 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=200 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 70 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=100 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 72 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=50 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
selecting default shared_buffers ... 400kB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
child process was terminated by signal 7: Bus error
initdb: removing contents of data directory "/var/lib/pgsql/data/userdata"

We have no great ideas on how to debug this as we are neither experts in Postgres code nor Kubernetes code and we cannot realistically debug all issues with running the Official Images in random environment X.

Since you have been able to get it to work on plain Docker on the "broken" machine (and on a separate Kubernetes cluster), then it is not a problem with the Docker image. I would recommend trying to find out what is different between your two clusters and what is different between the plain Docker run and the Kubernetes deployment config (cgroups like memory limits, --shm-size, etc). Maybe https://github.com/docker-library/postgres/issues/416.

In the future, it'd be better to post questions like this in the Docker Community Forums, the Docker Community Slack, Stack Overflow, or a Kubernetes specific help group.

Hi dlohin, I'm getting the same issue and to expand slightly further on this, it works on my local minikube but not on a private K8s cluster running on vmware datastores so I am suspecting it maye be this aspect. I will continue fighting it and feedback if I stumble across the fix.

Same issue here. This image on K8s does not work (simple kubectl run) but running on plain docker (docker run) in the same host does.
The error code is 135 with a bus error.

Hey Guys, any progress or info for this issue? I am stuck on this for an urgent Demo.

This is the only thing relevant I found relating to a bus error in kubernetes/docker
https://github.com/pytorch/pytorch/issues/2244

Looks like the shared memory of the docker container wasn't set high enough. Setting a higher amount by adding --shm-size 8G to the docker run . . .

Have the same issue, as well as on the gitlab image(error in postgres) and richarvey/nginx-php-fpm, webdevops/php-nginx, wordpress images (with php-fpm). Docker runs fine on the same host. Problem appeared on 1.12 version. 1.9 worked fine for me on all images.

The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
 The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
 Data page checksums are disabled.
 fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 10
selecting default shared_buffers ... 400kB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
Bus error (core dumped)
child process exited with exit code 135
initdb: removing contents of data directory "/var/lib/postgresql/data"
running bootstrap script ...

I've tried to run postgresql 9.6.5. I've tried to mount /dev/shm/ both on the same host path and in empty dir. It didn't help. Guess it's not problem with shared memory.

Host system: ubuntu 16.04.

So is there any clue about this issue?
I tried all above ways, None of them works.
And I try this same image with another machine, the error is gone. It's weird

For me temporary workaround was to run one node at 1.9.11 and run this kind of images on it.
P.S.: you can connect 1.9.11 node to 1.11 cluster.

I believe I hit the same issue (postgres works through docker run, but not k8s). The issue I hit was that huge pages were enabled, but they were not working through k8s, and Postgres wouldn't fall back properly to not using huge pages. I think there are several possible solutions to the problem:

  1. Modify the docker image to be able to set huge_pages = off in /usr/share/postgresql/postgresql.conf.sample before initdb was ran (this is what I did).
  2. Turn off huge page support on the system (vm.nr_hugepages = 0 in /etc/sysctl.conf).
  3. Fix Postgres's fallback mechanism when huge_pages = try is set (the default).
  4. Modify the k8s manifest to enable huge page support (https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/).
  5. Modify k8s to show that huge pages are not supported on the system, when they are not enabled for a specific container.

As nbartos said, I tried to set vm.nr_hugepages = 0 in /etc/sysctl.conf.
Thanks to nbartos. Now the postgres works well.
Yes, we will continue to find out the root cause.

Was this page helpful?
0 / 5 - 0 ratings