When waiting for Postgres to start up, PostgreSQLContainer looks for the String ".*database system is ready to accept connections.*\\s".
The database though has already started when PostgreSQLContainer is ready for waiting and thus will never find the String.
This issues seems to appear only on Windows machines - on Linux it was not observable.
The new PostgreSQLContainer wait behaviour was introduced in #327.
Is this String not logged on Windows machines? This String should normally be logged two times before the database is ready.
Yes, it is logged. It already came twice when PostgreSQLContainer starts waiting.
Ok, so that sounds like some race condition regarding the LogMessageWaitStrategy and might not be a PostgreSQLContainer (or Windows) specific problem.
@kiview Would like to ask if one can expect a fix for this bug in the near future? We are developing mostly on Windows machines and heavily using testcontainers in our services. Even if we have a little workaround it would be nice to get this issue solved. Don't know if it is also possible to support you in some way but if, we at least need to get some details explained. :)
@gbtec-ingogriebsch Since it might involve a race condition (which should be platform independent?), I don't know when it will be fixed. It would help to find this bug on Linux as well.
I'll be home next week (and I have a Windows 10 box at home) and I'll try to look into the problem.
Can someone share a project demonstrating this problem?
I just ran SimplePostgreSQLTest on my Windows 10 Pro 1703 with Docker 17.09.0 CE and starting the postgres container seems to work fine.
I see the following error in my console, the test succeeds however:
Okt 02, 2017 4:17:22 PM org.postgresql.Driver connect
SCHWERWIEGEND: Connection error:
org.postgresql.util.PSQLException: Der Verbindungsversuch schlug fehl.
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:275)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:194)
at org.postgresql.Driver.makeConnection(Driver.java:431)
at org.postgresql.Driver.connect(Driver.java:247)
at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:92)
at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:98)
at com.zaxxer.hikari.pool.BaseHikariPool.addConnection(BaseHikariPool.java:438)
at com.zaxxer.hikari.pool.BaseHikariPool$1.run(BaseHikariPool.java:413)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
at org.postgresql.core.PGStream.receiveChar(PGStream.java:284)
at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:418)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:222)
... 13 more
I think this happens when the container is shutting down, so nothing too bad.
Edit:
Indeed, happens when closing. Adding a ds.close() statement to the end of the test circumvents the error,
@kiview I'm using Docker for Windows, Version 17.06.1-ce-win24 (13025). I have not yet updated to Version 17.09.0-ce-win32 (13529). We will check if the problem is maybe solved if using the latest docker version.
And we can provide a simple test project for the problem we are having at the moment. But it will include nothing else as a Spring Boot app which has configured the postgres testcontainer triggered through the 'special' jdbc url.
The result of https://github.com/testcontainers/testcontainers-java/blob/master/modules/jdbc-test/src/test/java/org/testcontainers/junit/SimplePostgreSQLTest.java on my machine:
13:21:08.851 DEBUG 馃惓 [postgres:latest] - Starting container: postgres:latest
13:21:08.852 DEBUG 馃惓 [postgres:latest] - Trying to start container: postgres:latest
13:21:08.852 DEBUG 馃惓 [postgres:latest] - Trying to start container: postgres:latest (attempt 1/1)
13:21:08.852 DEBUG 馃惓 [postgres:latest] - Starting container: postgres:latest
13:21:08.852 INFO 馃惓 [postgres:latest] - Creating container for image: postgres:latest
13:21:09.037 INFO 馃惓 [postgres:latest] - Starting container with ID: 3bed081e31e630abe1e97560afb3510013eaf84603541e58c4a6d36d0f7f3494
13:21:09.375 INFO 馃惓 [postgres:latest] - Container postgres:latest is starting: 3bed081e31e630abe1e97560afb3510013eaf84603541e58c4a6d36d0f7f3494
13:22:09.467 ERROR 馃惓 [postgres:latest] - Could not start container
org.testcontainers.containers.ContainerLaunchException: Timed out waiting for log output matching '.*database system is ready to accept connections.*\s'
at org.testcontainers.containers.wait.LogMessageWaitStrategy.waitUntilReady(LogMessageWaitStrategy.java:31)
at org.testcontainers.containers.GenericContainer$AbstractWaitStrategy.waitUntilReady(GenericContainer.java:943)
at org.testcontainers.containers.PostgreSQLContainer.waitUntilContainerStarted(PostgreSQLContainer.java:92)
at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:235)
at org.testcontainers.containers.GenericContainer.lambda$start$0(GenericContainer.java:184)
at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:76)
at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:182)
at org.testcontainers.containers.GenericContainer.starting(GenericContainer.java:544)
at org.testcontainers.containers.FailureDetectingExternalResource$1.evaluate(FailureDetectingExternalResource.java:29)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
13:22:09.467 ERROR 馃惓 [postgres:latest] - Container log output (if any) will follow:
13:22:09.489 INFO 馃惓 [postgres:latest] - STDOUT: The files belonging to this database system will be owned by user "postgres".
13:22:09.489 INFO 馃惓 [postgres:latest] - STDOUT: This user must also own the server process.
13:22:09.489 INFO 馃惓 [postgres:latest] - STDOUT:
13:22:09.489 INFO 馃惓 [postgres:latest] - STDOUT: The database cluster will be initialized with locale "en_US.utf8".
13:22:09.489 INFO 馃惓 [postgres:latest] - STDOUT: The default database encoding has accordingly been set to "UTF8".
13:22:09.490 INFO 馃惓 [postgres:latest] - STDOUT: The default text search configuration will be set to "english".
13:22:09.490 INFO 馃惓 [postgres:latest] - STDOUT:
13:22:09.490 INFO 馃惓 [postgres:latest] - STDOUT: Data page checksums are disabled.
13:22:09.490 INFO 馃惓 [postgres:latest] - STDOUT:
13:22:09.490 INFO 馃惓 [postgres:latest] - STDOUT: fixing permissions on existing directory /var/lib/postgresql/data ... ok
13:22:09.490 INFO 馃惓 [postgres:latest] - STDOUT: creating subdirectories ... ok
13:22:09.490 INFO 馃惓 [postgres:latest] - STDOUT: selecting default max_connections ... 100
13:22:09.490 INFO 馃惓 [postgres:latest] - STDOUT: selecting default shared_buffers ... 128MB
13:22:09.490 INFO 馃惓 [postgres:latest] - STDOUT: selecting dynamic shared memory implementation ... posix
13:22:09.490 INFO 馃惓 [postgres:latest] - STDOUT: creating configuration files ... ok
13:22:09.490 INFO 馃惓 [postgres:latest] - STDOUT: running bootstrap script ... ok
13:22:09.491 INFO 馃惓 [postgres:latest] - STDOUT: performing post-bootstrap initialization ... ok
13:22:09.491 INFO 馃惓 [postgres:latest] - STDERR:
13:22:09.491 INFO 馃惓 [postgres:latest] - STDERR: WARNING: enabling "trust" authentication for local connections
13:22:09.491 INFO 馃惓 [postgres:latest] - STDERR: You can change this by editing pg_hba.conf or using the option -A, or
13:22:09.491 INFO 馃惓 [postgres:latest] - STDERR: --auth-local and --auth-host, the next time you run initdb.
13:22:09.491 INFO 馃惓 [postgres:latest] - STDOUT: syncing data to disk ... ok
13:22:09.491 INFO 馃惓 [postgres:latest] - STDOUT:
13:22:09.492 INFO 馃惓 [postgres:latest] - STDOUT: Success. You can now start the database server using:
13:22:09.492 INFO 馃惓 [postgres:latest] - STDOUT:
13:22:09.492 INFO 馃惓 [postgres:latest] - STDOUT: pg_ctl -D /var/lib/postgresql/data -l logfile start
13:22:09.492 INFO 馃惓 [postgres:latest] - STDOUT:
13:22:09.492 INFO 馃惓 [postgres:latest] - STDOUT: waiting for server to start....2017-10-10 11:21:10.459 UTC [39] LOG: listening on IPv4 address "127.0.0.1", port 5432
13:22:09.492 INFO 馃惓 [postgres:latest] - STDOUT: 2017-10-10 11:21:10.459 UTC [39] LOG: could not bind IPv6 address "::1": Cannot assign requested address
13:22:09.492 INFO 馃惓 [postgres:latest] - STDOUT: 2017-10-10 11:21:10.459 UTC [39] HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
13:22:09.492 INFO 馃惓 [postgres:latest] - STDOUT: 2017-10-10 11:21:10.465 UTC [39] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
13:22:09.493 INFO 馃惓 [postgres:latest] - STDOUT: 2017-10-10 11:21:10.479 UTC [40] LOG: database system was shut down at 2017-10-10 11:21:10 UTC
13:22:09.493 INFO 馃惓 [postgres:latest] - STDOUT: 2017-10-10 11:21:10.485 UTC [39] LOG: database system is ready to accept connections
13:22:09.493 INFO 馃惓 [postgres:latest] - STDOUT: done
13:22:09.493 INFO 馃惓 [postgres:latest] - STDOUT: server started
13:22:09.494 INFO 馃惓 [postgres:latest] - STDOUT: CREATE DATABASE
13:22:09.494 INFO 馃惓 [postgres:latest] - STDOUT:
13:22:09.494 INFO 馃惓 [postgres:latest] - STDOUT: CREATE ROLE
13:22:09.494 INFO 馃惓 [postgres:latest] - STDOUT:
13:22:09.494 INFO 馃惓 [postgres:latest] - STDOUT:
13:22:09.494 INFO 馃惓 [postgres:latest] - STDOUT: /usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
13:22:09.495 INFO 馃惓 [postgres:latest] - STDOUT:
13:22:09.495 INFO 馃惓 [postgres:latest] - STDOUT: waiting for server to shut down....2017-10-10 11:21:10.823 UTC [39] LOG: received fast shutdown request
13:22:09.495 INFO 馃惓 [postgres:latest] - STDOUT: 2017-10-10 11:21:10.827 UTC [39] LOG: aborting any active transactions
13:22:09.495 INFO 馃惓 [postgres:latest] - STDOUT: 2017-10-10 11:21:10.829 UTC [39] LOG: worker process: logical replication launcher (PID 46) exited with exit code 1
13:22:09.495 INFO 馃惓 [postgres:latest] - STDOUT: 2017-10-10 11:21:10.829 UTC [41] LOG: shutting down
13:22:09.495 INFO 馃惓 [postgres:latest] - STDOUT: 2017-10-10 11:21:10.851 UTC [39] LOG: database system is shut down
13:22:09.495 INFO 馃惓 [postgres:latest] - STDOUT: done
13:22:09.496 INFO 馃惓 [postgres:latest] - STDOUT: server stopped
13:22:09.496 INFO 馃惓 [postgres:latest] - STDOUT:
13:22:09.496 INFO 馃惓 [postgres:latest] - STDOUT: PostgreSQL init process complete; ready for start up.
13:22:09.496 INFO 馃惓 [postgres:latest] - STDOUT:
13:22:09.496 INFO 馃惓 [postgres:latest] - STDERR: 2017-10-10 11:21:10.948 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
13:22:09.496 INFO 馃惓 [postgres:latest] - STDERR: 2017-10-10 11:21:10.948 UTC [1] LOG: listening on IPv6 address "::", port 5432
13:22:09.497 INFO 馃惓 [postgres:latest] - STDERR: 2017-10-10 11:21:10.956 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
13:22:09.497 INFO 馃惓 [postgres:latest] - STDERR: 2017-10-10 11:21:10.972 UTC [52] LOG: database system was shut down at 2017-10-10 11:21:10 UTC
13:22:09.497 INFO 馃惓 [postgres:latest] - STDERR: 2017-10-10 11:21:10.977 UTC [1] LOG: database system is ready to accept connections
13:22:10.173 DEBUG org.testcontainers.utility.ResourceReaper - Removed container and associated volume(s): postgres:latest
org.testcontainers.containers.ContainerLaunchException: Container startup failed
at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:189)
at org.testcontainers.containers.GenericContainer.starting(GenericContainer.java:544)
at org.testcontainers.containers.FailureDetectingExternalResource$1.evaluate(FailureDetectingExternalResource.java:29)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: org.rnorth.ducttape.RetryCountExceededException: Retry limit hit with exception
at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:83)
at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:182)
... 17 more
Caused by: org.testcontainers.containers.ContainerLaunchException: Could not create/start container
at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:256)
at org.testcontainers.containers.GenericContainer.lambda$start$0(GenericContainer.java:184)
at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:76)
... 18 more
Caused by: org.testcontainers.containers.ContainerLaunchException: Timed out waiting for log output matching '.*database system is ready to accept connections.*\s'
at org.testcontainers.containers.wait.LogMessageWaitStrategy.waitUntilReady(LogMessageWaitStrategy.java:31)
at org.testcontainers.containers.GenericContainer$AbstractWaitStrategy.waitUntilReady(GenericContainer.java:943)
at org.testcontainers.containers.PostgreSQLContainer.waitUntilContainerStarted(PostgreSQLContainer.java:92)
at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:235)
... 20 more
It is somewhat environment specific, but seems not to depend on the OS itself. Further investigating.
After additional research, it seems it is not a race condition, since followOutput() will also retrieve old output. However, in the failing environments, followOutput() doesn't seem to return any frames in the callback.
It might be somehow related to https://github.com/docker-java/docker-java/issues/878, not sure though.
That's not good. Do you think we should revert use of the log-based wait strategy until we can get to the bottom of this?
I would appreciate that - we need to upgrade to testcontainers 1.4.x to make use of the latest java docker api and the TC_DAEMON parameter. Reverting it to use SELECT 1 would remedy this. But the revert would also re-introduce #317, wouldn't it?
Yes it would, but that's not really an error or a problem, more like verbose logging stuff happening. Other LogMessageWaitStrategy based containers wouldn't work either in this setup.
@rnorth
TBH I'm not sure what to do here right now. I was with @gbtec-markusmeier at the weekend, trying to fix the problem, pinpoint it, work around it, but the behavior regarding the log output was actually happening as soon as we've used logContainerCmd.withFollowStream(true).
So I think the problem might be somewhere in docker-java. I think it's interesting, this problem occurs specifically in the work environment of @gbtec-markusmeier and @gbtec-ingogriebsch, we weren't really able to spot the difference in the setup from which this problem might arise. My last idea was, that it might happen somewhere one the network layer, docker-java not being able to stream the logs.
@gbtec-markusmeier
Maybe we can raise an issue regarding the logContainerCmd.withFollowStream(true) problem in docker-java?
@kiview Do you have some news for us? Can we provide 'some' information to help you understanding the problem? Or is the next step to follow your suggestion and raise an issue in docker-java?
@gbtec-ingogriebsch Do you think you can raise a useful issue with enough details in docker-java? If yes this would be great.
As I said before, I wasn't really able to reproduce this problem on my machine and pinpoint the differences in the environment, so sadly I'm currently unable to get additional insight into this problem :disappointed:
As a work around you could always reside to extending your own custom JdbcContainer I suppose.
Just a little heads up:
We've currently the problem, that LogMessageWaitStrategy doesn't receive the first OutputFrame in one of our tests (reproducable).
I wonder why this is happening, since we do withSince(0) on following output.
Just noting that I've had a similar issue with 1.5.1 and the org.testcontainers.containers.localstack.LocalStackContainer on a Windows OS. Nothing to do with the RegExp format, and the line does get logged as I can see from docker logs on the container started, but it times out with the same time out error.
Looking under the hood I can see a netty request of this format: /containers/20bc52c3e633815dd14beeeb0f4ddbb41c3acb5908cee3dc7d230ea57e3e0f67/logs?since=0&stdout=true&stderr=true&follow=true
If I changed it to org.testcontainers.containers.wait.HostPortWaitStrategy it starts up fine.
Yes, this bug seems a bit more prominent on Windows, but we've observed it on different Linux environments as well. Sadly we couldn't figure out the root cause yet, it might be in the docker-java implementation, or it could be strange behaviour of the docker logs.
@gbtec-ingogriebsch @gbtec-markusmeier
Are you interested in checking out, if the last version with the included #643 solves your problem on Windows?
@kiview Nice stuff! We will try it for sure! Give us a moment, we will come back to you. /CC @gbtec-markusmeier
Tested with testcontainers:1.7.3:
WaitingConsumer being used is using the new BaseConsumer.WaitingConsumer does not have changes similar to Slf4jLogConsumer or ToStringConsumer done in https://github.com/testcontainers/testcontainers-java/pull/643.FrameConsumerResultCallback.onNext() is called only after the waiting period of 60 seconds, so it has no effect within the scope of this ticket. When it is called, processRawFrame() is never touched - so it seems to be another problem.@gbtec-markusmeier Thanks for testing and the feedback!
Okay, maybe we can finde something in WaitingConsumer.
Today I got this error too...
org.testcontainers.containers.ContainerLaunchException: Container startup failed
at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:214)
at org.testcontainers.containers.GenericContainer.starting(GenericContainer.java:638)
at org.testcontainers.containers.FailureDetectingExternalResource$1.evaluate(FailureDetectingExternalResource.java:29)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: org.rnorth.ducttape.RetryCountExceededException: Retry limit hit with exception
at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:83)
at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:207)
... 9 more
Caused by: org.testcontainers.containers.ContainerLaunchException: Could not create/start container
at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:279)
at org.testcontainers.containers.GenericContainer.lambda$start$0(GenericContainer.java:209)
at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:76)
... 10 more
Caused by: org.testcontainers.containers.ContainerLaunchException: Timed out waiting for log output matching '.*database system is ready to accept connections.*\s'
at org.testcontainers.containers.wait.strategy.LogMessageWaitStrategy.waitUntilReady(LogMessageWaitStrategy.java:30)
at org.testcontainers.containers.wait.strategy.AbstractWaitStrategy.waitUntilReady(AbstractWaitStrategy.java:32)
at org.testcontainers.containers.wait.LogMessageWaitStrategy.waitUntilReady(LogMessageWaitStrategy.java:17)
at org.testcontainers.containers.wait.strategy.AbstractWaitStrategy.waitUntilReady(AbstractWaitStrategy.java:32)
at org.testcontainers.containers.PostgreSQLContainer.waitUntilContainerStarted(PostgreSQLContainer.java:103)
at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:258)
... 12 more
Probably it helps to localize the problem... I'm not using default Postgres image, but Postgres with PostGIS and Timescale https://hub.docker.com/r/timescale/timescaledb-postgis/ or my own https://hub.docker.com/r/binakot/postgresql-postgis-timescaledb/
For example, when I run the image timescale/timescaledb-postgis:latest-pg9.6, I got this:
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
sh: locale: not found
performing post-bootstrap initialization ... No usable system locales were found.
Use the option "--debug" to see details.
ok
syncing data to disk ... ok
Success. You can now start the database server using:
pg_ctl -D /var/lib/postgresql/data -l logfile start
WARNING: enabling "trust" authentication for local connections
For timescale/timescaledb-postgis:latest-pg10:
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
sh: locale: not found
2018-09-21 05:58:43.959 UTC [26] WARNING: no usable system locales were found
performing post-bootstrap initialization ... ok
WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
syncing data to disk ... ok
Success. You can now start the database server using:
pg_ctl -D /var/lib/postgresql/data -l logfile start
waiting for server to start....2018-09-21 05:58:44.462 UTC [30] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2018-09-21 05:58:44.474 UTC [31] LOG: database system was shut down at 2018-09-21 05:58:44 UTC
2018-09-21 05:58:44.483 UTC [30] LOG: database system is ready to accept connections
2018-09-21 05:58:44.485 UTC [37] LOG: TimescaleDB background worker launcher connected to shared catalogs
done
server started
CREATE DATABASE
/usr/local/bin/docker-entrypoint.sh: sourcing /docker-entrypoint-initdb.d/install_timescaledb.sh
2018-09-21 05:58:44.851 UTC [43] FATAL: role "postgres" does not exist
psql: FATAL: role "postgres" does not exist
And for binakot/postgresql-postgis-timescaledb:latest:
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
sh: locale: not found
2018-09-21 06:00:10.260 UTC [27] WARNING: no usable system locales were found
performing post-bootstrap initialization ... ok
WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
syncing data to disk ... ok
Success. You can now start the database server using:
pg_ctl -D /var/lib/postgresql/data -l logfile start
waiting for server to start....2018-09-21 06:00:10.712 UTC [31] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2018-09-21 06:00:10.723 UTC [32] LOG: database system was shut down at 2018-09-21 06:00:10 UTC
2018-09-21 06:00:10.725 UTC [31] LOG: database system is ready to accept connections
done
server started
CREATE DATABASE
CREATE ROLE
/usr/local/bin/docker-entrypoint.sh: sourcing /docker-entrypoint-initdb.d/1.postgis.sh
CREATE DATABASE
UPDATE 1
Loading PostGIS extensions into template_postgis
CREATE EXTENSION
CREATE EXTENSION
CREATE EXTENSION
CREATE EXTENSION
Loading PostGIS extensions into test
CREATE EXTENSION
CREATE EXTENSION
CREATE EXTENSION
CREATE EXTENSION
/usr/local/bin/docker-entrypoint.sh: sourcing /docker-entrypoint-initdb.d/2.timescaledb.sh
CREATE DATABASE
UPDATE 1
Loading TimescaleDB extensions into template_timescaledb
CREATE EXTENSION
Loading TimescaleDB extensions into test
CREATE EXTENSION
/usr/local/bin/docker-entrypoint.sh: sourcing /docker-entrypoint-initdb.d/3.postgres.sh
version
---------------------------------------------------------------------------------------
PostgreSQL 10.4 on x86_64-pc-linux-musl, compiled by gcc (Alpine 6.4.0) 6.4.0, 64-bit
(1 row)
extname | extversion
------------------------+------------
plpgsql | 1.0
postgis | 2.4.4
postgis_topology | 2.4.4
fuzzystrmatch | 1.1
postgis_tiger_geocoder | 2.4.4
timescaledb | 0.10.1
(6 rows)
waiting for server to shut down....2018-09-21 06:00:15.387 UTC [31] LOG: received fast shutdown request
2018-09-21 06:00:15.397 UTC [31] LOG: aborting any active transactions
2018-09-21 06:00:15.398 UTC [31] LOG: worker process: logical replication launcher (PID 38) exited with exit code 1
2018-09-21 06:00:15.398 UTC [33] LOG: shutting down
2018-09-21 06:00:15.437 UTC [31] LOG: database system is shut down
done
server stopped
PostgreSQL init process complete; ready for start up.
2018-09-21 06:00:15.498 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2018-09-21 06:00:15.498 UTC [1] LOG: listening on IPv6 address "::", port 5432
2018-09-21 06:00:15.501 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2018-09-21 06:00:15.514 UTC [59] LOG: database system was shut down at 2018-09-21 06:00:15 UTC
2018-09-21 06:00:15.518 UTC [1] LOG: database system is ready to accept connections
The most interesting thing that everything worked like a charm before I re-install Windows 10 on my PC 馃槩
But now It works only with my own image, because:
1) There is no log database system is ready to accept connections in timescale/timescaledb-postgis:latest-pg9.6 at all. But we have it in timescale/timescaledb-postgis:latest-pg10.
2) In my image I got database system is ready to accept connections 2 times, because I restart the Postgres after extension installations. I think the test-container catches exactly the last one.
But how did it work before? I didn't change versions of libraries or images.
I just tried every version of timescale docker images one by one.
And I found the working image: timescale/timescaledb-postgis:0.10.1-pg10.
Maybe we should roll back to the Jdbc based wait strategy for the sake of stability, WDYT @rnorth ?
Or even better to let to choose the strategy via argument or smth else?
I'm using rule's annotation for my junit tests.
I've just checked the code and I see that we don't have a JdbcWaitStrategy (I somehow thought we once had?). The actual wait implementation is in theJdbcDatabaseContainer (see https://github.com/testcontainers/testcontainers-java/blob/c91513d9ebfb3051c2ddaf9b0581e7dbb10e236a/modules/jdbc/src/main/java/org/testcontainers/containers/JdbcDatabaseContainer.java#L115).
So one solution would be, to extract a real JdbcWaitStrategy, which would allow for easier composition and configurability, simply use it in waitingFor() method on container instances. We could still use the log based waiting as the faster default, but user in environments which observe this behavior (which seems strangely related to Docker for Windows?), can opt-in to use the JdbcWaitStrategy.
@gbtec-markusmeier What is your take on this? I know this bug blocks you from upgrading to recent Testcontainers version, so I'd be happy to find a solution.
Since I'm a bit uncomfortable with the current state of PostgreSQLContainer, which is unreliable for some user, and I'm the one responsible for changing it to LogMessageWaitStrategy in the first place, I'd be happy to take care of it.
@rnorth @bsideup WDYT?
@kiview: In any case we will be happy to be able to have a query based db readiness check again - as this works reliable in our environment.
It would be a possibility (as discussed in person last year) to OR combine that with the current log based strategy - i.e. whichever comes first will result in a successful wait. But if it is easier/faster to just go back to the former, only query based strategy, why not go back to that one? There weren't functional problems in any scenarios, so it should be fine.
After some geek-at-keyboard tests by @gbtec-markusmeier and me, it seems this problem was solved in 1.9.1, probably because of the switch to okhttp and npipe support.
@gbtec-markusmeier will test a bit more and then we might be able to close this ticket :slightly_smiling_face:.
@gbtec-markusmeier For getting some more confirmation of the underlying bug, could you please try to run 1.9.1 with Netty and check if the problem reappears?
Just set
transport.type=netty
in ~/.testcontainers.properties. This would be very helpful. (You would also need to switch back to TCP port on Docker and probably delete the discovered strategy line from ~/.testcontainers.properties)
@gbtec-markusmeier For getting some more confirmation of the underlying bug, could you please try to run 1.9.1 with Netty and check if the problem reappears?
Just set
transport.type=nettyin
~/.testcontainers.properties. This would be very helpful. (You would also need to switch back to TCP port on Docker and probably delete the discovered strategy line from~/.testcontainers.properties)
The problem does not occur using netty:
o.t.d.DockerClientProviderStrategy : Loaded org.testcontainers.dockerclient.WindowsClientProviderStrategy from ~/.testcontainers.properties, will try it first
o.t.d.DockerClientProviderStrategy : Will use 'netty' transport
o.t.d.DockerClientProviderStrategy : Found Docker environment with Docker for Windows (via TCP port 2375)
org.testcontainers.DockerClientFactory : Docker host IP address is localhost
org.testcontainers.DockerClientFactory : Connected to docker:
Server Version: 18.06.1-ce
API Version: 1.38
Operating System: Docker for Windows
Total Memory: 16006 MB
org.testcontainers.DockerClientFactory : Ryuk started - will monitor and terminate Testcontainers containers on JVM exit
?? Checking the system...
? Docker version should be at least 1.6.0
? Docker environment should have more than 2GB free disk space
docker[postgres:9.5] : Creating container for image: postgres:9.5
docker[postgres:9.5] : Starting container with ID: 5597e1cdf786e868c55f733dfa1abe11d4625ff9a76c3210a74914bae0706f05
docker[postgres:9.5] : Container postgres:9.5 is starting: 5597e1cdf786e868c55f733dfa1abe11d4625ff9a76c3210a74914bae0706f05
docker[postgres:9.5] : Container postgres:9.5 started
Thanks, so probably solved by something else, but solved never the less! :japanese_ogre:
I'll close the issue for now, @gbtec-markusmeier please re-open if the problem persists in some combination.
Most helpful comment
Maybe we should roll back to the Jdbc based wait strategy for the sake of stability, WDYT @rnorth ?