Airflow: SparkSubmitOperator only masks one "form" of password arguments

Created on 30 Jun 2020  路  6Comments  路  Source: apache/airflow

Hello there, everyone. :)

Apache Airflow version: 1.10.9, 1.10.10, trunk

  • OS (e.g. from /etc/os-release): Linux
  • Others: Bash/sh

What happened:

Password masking was added to SparkSubmitOperator (SparkSubmitHook, to be precise) in December 2019 (under AIRFLOW-6350; PR: #6917) - but it only masks passwords as long as they are in the --foo.password='value' form; i.e. it must be put in single-quotes and be joined with the argument's name via an equal sign.

What you expected to happen:

I would expect the forms a) with double-quotes or with no quotes at all b) with whitespace instead of an equal sign to also be covered by this mechanism, e.g.

  • --foo.password=value
  • --foo.password="value"
  • --foo.password 'value'
  • --foo.password value
  • --foo.password "value"

But I may be missing something. Is there any reason the initial version only covers the single-quoted-with-equal-sign form? The regular expression used in the masking code (1.10.9 version, trunk version) looks pretty intentional:

    def _mask_cmd(self, connection_cmd):
        # Mask any password related fields in application args with key value pair
        # where key contains password (case insensitive), e.g. HivePassword='abc'

        connection_cmd_masked = re.sub(
            r"(\S*?(?:secret|password)\S*?\s*=\s*')[^']*(?=')",
            r'\1******', ' '.join(connection_cmd), flags=re.I)

How to reproduce it:

from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator  # Airflow 1.10.9

dag = DAG(...)
SparkSubmitOperator(
    ...,
    conf={"spark.foo.password": "this_should_get_masked_but_it_doesnt"},
    dag=dag,
)

Running such a task will leak the password into Airflow logs.

Anything else we need to know:

Again, I may be missing something, e.g. sth OS-specific. I'd be happy to learn something here. :)

In case all/part of the other forms I mentioned should also get the masking treatment, I have a change ready for opening a PR.

(Note there's no JIRA issue referenced in the commit messages: I cannot create issues in Airflow's Jira for some reason)

bug

All 6 comments

Thanks for opening your first issue here! Be sure to follow the issue template!

@Unit03 pls raise the PR. My original PR that you mentioned was very crude (as you noticed!), but better than nothing :) Was done in that format because all my dags use that 1 specific format of sending conf

All right, then, PR opened. :)

can this be closed?

Looks like!

FYI @Unit03. You can put Closes #ISSUE in the commit message and it will close related issue at merge :).

Was this page helpful?
0 / 5 - 0 ratings