Airflow: Better unit tests for Helm Chart

Created on 19 Oct 2020  路  25Comments  路  Source: apache/airflow

Hello,

Overview

Currently, Helm Chart for Airflow has two types of tests: (Learn the best practices of Helm Chart testing)

  • Template testing (unit testing): these tests render the templates against various input values. These types of tests let us verify that the template rendered the expected resources in the manner you intended. These tests are fast to execute and can catch syntactic errors of your template
  • Integration testing: these tests take the rendered templates and deploy them to a real Kubernetes cluster. We can then verify the deployed infrastructure works as intended by hitting the endpoints or querying Kubernetes for the resources. These tests closely resemble an actual deployment and give us a close approximation of how it might behave when ready to push the chart to production.

In each case, we use a different framework:

  • For template testing, we use Helm-unittest
  • For integration testing, we use bash scripts and Pytest + kind cluster

Start reading here

Today I would like to talk about template testing. In my opinion, using helm-unittest is not a very good solution.

  • This framework is not very popular, so any contributor who wants to change must read the documentation first to add new tests. (People are lazy by nature, so they don't add these tests).
  • Another problem is the form - the YAML file. The YAML file is not very flexible and makes it very difficult to write more unusual tests. It would be more flexible to express the tests in a more normal programming language.

During the discussion with @dimberman, I prepared a small POC that shows how we can test the Helm Chart with Pytest + Python.
Here is link: https://github.com/apache/airflow/pull/11533#discussion_r504998933

import subprocess
import unittest
from tempfile import NamedTemporaryFile

import yaml
from typing import Dict, Optional

OBJECT_COUNT_IN_BASIC_DEPLOYMENT = 22


class TestBaseChartTest(unittest.TestCase):
    def render_chart(self, name, values: Optional[Dict] = None):
        values = values or {}
        with NamedTemporaryFile() as tmp_file:
            content = yaml.dump(values)
            tmp_file.write(content.encode())
            tmp_file.flush()
            templates = subprocess.check_output(["helm", "template", name, "..", '--values', tmp_file.name])
            k8s_objects = yaml.load_all(templates)
            k8s_objects = [k8s_object for k8s_object in k8s_objects if k8s_object]
            return k8s_objects

    def test_basic_deployments(self):
        k8s_objects = self.render_chart("TEST-BASIC", {"chart": {'metadata': 'AA'}})
        list_of_kind_names_tuples = [
            (k8s_object['kind'], k8s_object['metadata']['name'])
            for k8s_object in k8s_objects
        ]
        self.assertEqual(
            list_of_kind_names_tuples,
            [
                ('ServiceAccount', 'TEST-BASIC-scheduler'),
                ('ServiceAccount', 'TEST-BASIC-webserver'),
                ('ServiceAccount', 'TEST-BASIC-worker'),
                ('Secret', 'TEST-BASIC-postgresql'),
                ('Secret', 'TEST-BASIC-airflow-metadata'),
                ('Secret', 'TEST-BASIC-airflow-result-backend'),
                ('ConfigMap', 'TEST-BASIC-airflow-config'),
                ('ClusterRole', 'TEST-BASIC-pod-launcher-role'),
                ('ClusterRoleBinding', 'TEST-BASIC-pod-launcher-rolebinding'),
                ('Service', 'TEST-BASIC-postgresql-headless'),
                ('Service', 'TEST-BASIC-postgresql'),
                ('Service', 'TEST-BASIC-statsd'),
                ('Service', 'TEST-BASIC-webserver'),
                ('Deployment', 'TEST-BASIC-scheduler'),
                ('Deployment', 'TEST-BASIC-statsd'),
                ('Deployment', 'TEST-BASIC-webserver'),
                ('StatefulSet', 'TEST-BASIC-postgresql'),
                ('Secret', 'TEST-BASIC-fernet-key'),
                ('Secret', 'TEST-BASIC-redis-password'),
                ('Secret', 'TEST-BASIC-broker-url'),
                ('Job', 'TEST-BASIC-create-user'),
                ('Job', 'TEST-BASIC-run-airflow-migrations')
            ]
        )
        self.assertEqual(OBJECT_COUNT_IN_BASIC_DEPLOYMENT, len(k8s_objects))

    def test_basic_deployment_without_default_users(self):
        k8s_objects = self.render_chart("TEST-BASIC", {"webserver": {'defaultUser': {'enabled': False}}})
        list_of_kind_names_tuples = [
            (k8s_object['kind'], k8s_object['metadata']['name'])
            for k8s_object in k8s_objects
        ]
        self.assertNotIn(('Job', 'TEST-BASIC-create-user'), list_of_kind_names_tuples)
        self.assertEqual(OBJECT_COUNT_IN_BASIC_DEPLOYMENT - 1, len(k8s_objects))

Alternatively, we can also migrate to terratest, which has native Helm Chart integration but uses Go-lang.

Now there is a question for the community. What do you think about it? Should we migrate all tests to Pytestt? Is this the only reason contributors don't add unit tests? What else can we do to encourage contributors to write tests?

Best regards,
Kamil Bregu艂a

helm-chart feature

Most helpful comment

@mik-laj I played around with your python testing framework a bit more this morning and I LOVE it!!!

I made some modifications to clean up a bit and allow us to play with k8s objects instead of dicts but PTAL I think this will make everything MUCH easier :) https://github.com/apache/airflow/pull/11693

All 25 comments

CC: @potiuk @ashb @kaxil @dimberman @turbaszek

CC: @OmairK I know you wanted to know more about Kubernetes.

I am absolutely for this. Comparing to the above proposal, it's heaven and earth.

I very much dislike the current yaml based testing, they're difficult to maintain, refactor, automate, run from the IDE, debug, duplicate to write new tests etc. (unless I do not now how - maybe I just need some training).

I've added a few yaml-based tests in the past and fixed a few failing ones when I added new features (like kerberos sidecar). And it was a pain. For me this is the main reason why I have not added tests to my changes to the helm chart - because I disliked the idea of being close to that part of the code.

Sorry for the rant, it's not really, really bad with those yaml tests, but it is pretty close. I certainly feel subconscious " let's not go anywhere near those tests" when I think about them :). And looking at the proposal above, it is so much better.

CC: @schnie

This is an interesting problem. I personally lean more towards terratest than I do towards any home-bakes solution. This of course raises the question of "but is airflow a python-only project" to which my reply would be "helm is a golang-based system. You're using go templating and all major tooling around it is written in golang."

@kaxil @ashb would love your thoughts as well, but especially since the helm chart is basically a separate entity from airflow itself, and that any python solution would essentially be home-baked, I'm pretty heavily for terratest.

Snap comment without reading much of the context (sorry!): Yeah, I have no love of yaml, but I value not having to write our own test framework more.

I'll read this in detail tomorrow

@dimberman DevOps world uses Bash, Python and Golang. I think anyone who knows Helm knows the basics of Python. However, this is not true the other way around. Not everyone who knows Python also knows Go.

We also don't need all Terratest features. If we have one goal - testing Helm templates, Then pytest will meet the expectations and we won't have to write a lot of our own code. It is enough to generate a template (render_chart method) and use the standard assertions available in Python.

@ashb I don't want to write my own framework, but using pytest instead of helm-unittest.

Yeah I am strongly with @mik-laj on that one. This has nothing to do with writing own framework. Pytest is there and we all know it well. Those tests are basically about rendering yaml files and comparing if they are what they are expected to be. We use precisely "0" other features of terratest.

To be precise, terratest is just a library that provides a set of functions that make it easier to write tests, but it is not a full framework. You can use any framework, but the documentation recommends using Go鈥檚 built-in package testing. The module of interest for us is:
(modules/helm/template.go)[https://github.com/gruntwork-io/terratest/blob/master/modules/helm/template.go]

So now we have discussions as to which we want to use one of the following set of tools.

  • pytest as a test runner, unittest.TestCase as a framework (most likely) and one of our own functions,
  • go test command as a test runner, go/pkg/testing as a framework, and terratest with a ready function.

I don't think we use any unique golang features (good asynchronous libraries, security, etc.) on the other hand, Python is a language valued for its simple syntax and user-friendliness. I would like to add that currently the integration tests for Helm Chart are also written in Python.

What are the benefits of Golang in our case? The potential popularity with other contributors is a benefit, but I don't see them being active in the development of this Helm Chart. Rather, new features are being added by active contributors to the project who are already familiar with Python. However, if someone is not familiar with Python, writing new tests should not be a challenge for him, because Python is simpler than golang.

@mik-laj I played around with your python testing framework a bit more this morning and I LOVE it!!!

I made some modifications to clean up a bit and allow us to play with k8s objects instead of dicts but PTAL I think this will make everything MUCH easier :) https://github.com/apache/airflow/pull/11693

(also it's a draft PR so tests will certainly fail/code quality needs clean-up but I think the basic idea will scale really well)

Yes, in quickly skimming the issue last night on my phone I missed a lot of the context.

And an alternative version that uses pytest fixtures:

https://github.com/apache/airflow/pull/11694

Hello,

I come from https://github.com/apache/airflow/pull/11681#issuecomment-712790910 .

So a bit of feedback on the experience with the current setup:

  • The hardest part was for me to install the helm plugin (which doesn't seem to be maintained much ; I ended up on the fork https://github.com/quintush/helm-unittest)
  • Maybe add a bit of documentation in the chart's README (the Overview in the description of this issue is pretty nice) on the steps needed to run the tests locally,
  • It's really slow to run,
  • It's nice to be able to test template files individually,
  • I feel it's ok to test with yaml ; but it's not very flexible (let's say I wanted to add a test that checks that after rendering, if kerberos.enabled=false, then kerberos is not present in any of the templates, ...)

Going for something a bit more custom with golang or python will enable easier "advance" tests (this chart has some very complex logic, so I have a feeling that it's going to be useful).

And on Golang vs Python, I'd agree with:

What are the benefits of Golang in our case? The potential popularity with other contributors is a benefit, but I don't see them being active in the development of this Helm Chart. Rather, new features are being added by active contributors to the project who are already familiar with Python. However, if someone is not familiar with Python, writing new tests should not be a challenge for him, because Python is simpler than golang.

I think some small "internal" python testing logic like what I can see in #11693 is nice.
It will also enable easy debugging, etc.

@mik-laj @FloChehab @potiuk Do you prefer the unit test style, or the pytest fixture style? #11693 and #11694

@ashb I prefer unittest.TestCase, but I don't have a strong opinion. Any solution that is written in Python works for me.

Related discussion: https://lists.apache.org/thread.html/1e4df7d4b0cd9b2d2bb76a3336471aa85e852545dd41ada6d4e461b8%40%3Cdev.airflow.apache.org%3E

Honestly I think my biggest complaint now is the self.assertEqual(a,b) vs assert a == b - the later I find easier to read, and now we are using pytest for the runner it also gives better diagnostics in case of failure.

I think pytest fixtures are powerful, but on first seeing them they are a bit "magic" I'll grant.

Honestly I think my biggest complaint now is the self.assertEqual(a,b) vs assert a == b - the later I find easier to read, and now we are using pytest for the runner it also gives better diagnostics in case of failure.

+1 for assert style, I think we should fix it in all database after 2.0 - there's a tool to do that.

I think pytest fixtures are powerful, but on first seeing them they are a bit "magic" I'll grant.

I think I'm ok with both - fixtures and mocking. Sometimes I prefer to use one way and in other situation I prefer the other one.

In general, I think the test style is something that we should discuss after 2.0 - we didn't want to do any changes/enforcement previously because we had a looooot of changes (pre-commits, pytest, AIP-21 and more).

I think pytest fixtures are powerful, but on first seeing them they are a bit "magic" I'll grant.

I like fixtures when used with the scope module or session. When I only need to use fixtures for one test, I prefer to create simple functions instead of using fixtures/magic.

I think I agree with all of you @ashb @turbaszek and @mik-laj :). Seems we all agree actually :)

+1 asserts are better (but let's not make it consistent just yet)
+1 on using both fixtures and mocks - and @mik-laj's case is a good example. I also prefer an explicit method where I do not have to something "common" to modules/session. And simple setup/tearDown is so much embedded now in my unit test thinking that in simple cases I prefer to use them. But this might change in the future as I get used to it.

Pytest has setup/teardown equivalents too, even without fixtures https://docs.pytest.org/en/stable/xunit_setup.html#method-and-function-level-setup-teardown

Pytest has setup/teardown equivalents too, even without fixtures https://docs.pytest.org/en/stable/xunit_setup.html#method-and-function-level-setup-teardown

Correct. But it feels like joining the worst of both worlds. Classic unit-test setUp/tearDown are familiar and "natural" where pytest fixtures are modern and a bit less familiar. I'd rather move to fixtures rather than to pytest's setup/teardown.

@mik-laj @FloChehab @potiuk Do you prefer the unit test style, or the pytest fixture style? #11693 and #11694

I don't have an opinion for going for the one or the other ; I am sure you'll make the good choice.

I like both of them - @ashb and @dimberman . However, if I had to choose one PR it would be @dimberman's PR because it includes additional k8s API validation.

Yeah Daniel has taken it further than I have - I was just creating a prototype.

I suspect once he's done and merged I may show converting it to pytest

I open this ticket because we need to migrate the rest of the tests to complete the migrations.

Was this page helpful?
0 / 5 - 0 ratings