Pulsar: Large number of unstable Github tests for Pulsar

Created on 24 Jan 2020  路  7Comments  路  Source: apache/pulsar

Description
Pulsar tests regularly succeed locally, but after submitting a Pull Request to merge changes into master, some of the tests (usually < 3 at a time) will randomly fail.

Examples include, but are not limited to:

In CI - CPP, Python Tests / cpp-tests:

  • BasicEndToEndTest.testPatternEmptyUnsubscribe
  • BasicEndToEndTest.testSinglePartitionRoutingPolicy

In CI - Unit - Brokers:

  • org.apache.pulsar.client.api.SimpleProducerConsumerTest
    - org.apache.pulsar.client.api.SimpleProducerConsumerTest.setup
  • org.apache.pulsar.client.impl.BrokerClientIntegrationTest
    - testUnsupportedBatchMessageConsumer
    - which gave: BrokerClientIntegrationTest.testUnsupportedBatchMessageConsumer:388->ProducerConsumerBase.testMessageOrderAndDuplicates:57 Received message my-message-7 did not match the expected message my-message-0 expected [my-message-0] but found [my-message-7]
  • org.apache.pulsar.client.api.PartitionCreationTest
    - testCreateConsumerForPartitionedTopicUpdateWhenDisableTopicAutoCreation
  • org.apache.pulsar.client.impl.ReaderTest
    - testReadMessageWithBatchingWithMessageInclusive
    - which gave: java.lang.AssertionError: expected [true] but found [false]

In CI - Unit - Flaky:

  • org.apache.pulsar.client.kafka.test.KafkaProducerSimpleConsumerTest
    - testPulsarKafkaProducerWithSerializer
  • org.apache.pulsar.functions.worker.PulsarFunctionE2ESecurityTest
    - testAuthorizationWithAnonymousUser
  • org.apache.pulsar.broker.service.PersistentFailoverE2ETest
    - testSimpleConsumerEventsWithoutPartition
    - which gave: java.lang.AssertionError: expected [null] but found [-1]

In CI - Unit - Proxy:

  • org.apache.pulsar.proxy.server.ProxyParserTest
    - org.apache.pulsar.proxy.server.ProxyParserTest.testRegexSubscription
    - which gave: org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.ExecutionException: org.apache.pulsar.client.api.PulsarClientException: Disconnected from server at fv-az98.onfd2ysmc4sedambc1t2u4afph.cx.internal.cloudapp.net/10.1.0.4:33515

In CI - Integration - Function State:

  • org.apache.pulsar.tests.integration.functions.PulsarStateTest
    - org.apache.pulsar.tests.integration.functions.PulsarStateTest.pulsar-standalone-suite
    - PulsarStateTest.testPythonWordCountFunction
    - which gave: PulsarStateTest.testPythonWordCountFunction:78->publishAndConsumeMessages:410 禄 ThreadTimeout
    - PulsarStateTest.testSinkState
    - which gave: PulsarStateTest.testSinkState:183 expected [val1-9] but found [val1-8]

In CI - Unit - Adapters:

  • org.apache.pulsar.storm.PulsarBoltTest
    - beforeMethod
    - which gave: java.lang.IllegalStateException: Failed to initialize producer for persistent://my-property/my-ns/my-topic1 : HTTP get request failed: Internal Server Error

Regarding org.apache.pulsar.client.api.SimpleProducerConsumerTest.setup, I found an interesting exception message:
org.apache.pulsar.client.api.SimpleProducerConsumerTest.setup(org.apache.pulsar.client.api.SimpleProducerConsumerTest) 165[ERROR] Run 1: SimpleProducerConsumerTest.setup:108->MockedPulsarServiceBaseTest.internalSetup:107->MockedPulsarServiceBaseTest.init:144->MockedPulsarServiceBaseTest.startBroker:195->MockedPulsarServiceBaseTest.startBroker:218 禄 WrongTypeOfReturnValue

This message suggests that there's a race condition in the testing framework (or our use of it).
Perhaps there are known concurrency bugs in some of the versions of the test libraries we are using.

Each time the tests are run (e.g. for PR #6031), different tests fail. There seems to be no consistency to them at all.

There is also a risk that there are concurrency bugs in the actual framework that only appear in certain environments. If this is the case, then these bugs could result in instability for certain users in production environments.

Expected behavior
Tests should not randomly fail when run by Jenkins or the Github CI Action test runner after submitting a Pull Request. These random failures significantly slow down the rate of being able to merge PRs and raise the possibility of other potential risks.

flaky-tests triagweek-4 typbug

Most helpful comment

After discovering quite a few closed issues involving intermittent tests, I realized that almost all of them involved race conditions or concurrency issues involving shared state.
So, I'll start looking at these failing tests in greater depth individually.
I'm hoping that some common causes will emerge, but so far, it looks like each one of the solved intermittent tests has been a unique case.

All 7 comments

Do we ever have cases where multiple threads are stubbing/verifying a shared mock?

@merlimat Do you have any theories about what could be happening? It looks like you've spent a decent amount of time in a lot of these tests.

@sijie I added a lot of detail about the different exceptions that are appearing.

I noticed that there are many tests (including at least two of the tests above) that use this block of code that involves a Thread.sleep, which is often an anti-pattern in tests.

In MockedPulsarServiceBaseTest.java:

public static boolean retryStrategically(Predicate<Void> predicate, int retryCount, long intSleepTimeInMillis)
            throws Exception {
        for (int i = 0; i < retryCount; i++) {
            if (predicate.test(null) || i == (retryCount - 1)) {
                return true;
            }
            Thread.sleep(intSleepTimeInMillis + (intSleepTimeInMillis * i));
        }
        return false;
    }

At least these two tests (referenced above) use retryStrategically:

org.apache.pulsar.client.kafka.test.KafkaProducerSimpleConsumerTest

  • testPulsarKafkaProducerWithSerializer

org.apache.pulsar.functions.worker.PulsarFunctionE2ESecurityTest

  • testAuthorizationWithAnonymousUser

After discovering quite a few closed issues involving intermittent tests, I realized that almost all of them involved race conditions or concurrency issues involving shared state.
So, I'll start looking at these failing tests in greater depth individually.
I'm hoping that some common causes will emerge, but so far, it looks like each one of the solved intermittent tests has been a unique case.

Thanks @devinbost for the great help

I found an interesting study that examined common causes of Flaky tests: http://mir.cs.illinois.edu/~qluo2/fse14LuoHEM.pdf

Was this page helpful?
0 / 5 - 0 ratings