Airflow: Scheduler errors when using MSSQL as the backend database

Created on 22 Jul 2020  路  15Comments  路  Source: apache/airflow

Apache Airflow version: 1.10.11

Kubernetes version (if you are using kubernetes) (use kubectl version):

Environment:

  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

What happened:

What you expected to happen:

How to reproduce it:

Set MSSQL as the sql connection backend and examine the airflow scheduler logs

```
[2020-07-22 08:27:16,334] {settings.py:306} DEBUG - Disposing DB connection pool (PID 7714)
Process DagFileProcessor1036-Process:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1278, in _execute_context
cursor, statement, parameters, context
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
cursor.execute(statement, parameters)
pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near '1'. (102) (SQLExecDirectW)")
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(self._args, *self._kwargs)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 159, in _run_file_processor
pickle_dags)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/db.py", line 74, in wrapper
return func(args, *kwargs)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1593, in process_file
paused_dag_ids = models.DagModel.get_paused_dag_ids(dag_ids=dagbag.dag_ids)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/db.py", line 74, in wrapper
return func(args, *kwargs)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/dag.py", line 1808, in get_paused_dag_ids
.filter(DagModel.dag_id.in_(dag_ids))
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3341, in all
return list(self)
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3503, in __iter__
return self._execute_and_instances(context)
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3528, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1014, in execute
return meth(self, multiparams, params)
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1133, in _execute_clauseelement
distilled_params,
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1318, in _execute_context
e, statement, parameters, cursor, context
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1512, in _handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from_=e
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
raise exception
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1278, in _execute_context
cursor, statement, parameters, context
File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near '1'. (102) (SQLExecDirectW)")
[SQL: SELECT dag.dag_id AS dag_dag_id
FROM dag
WHERE dag.is_paused IS 1 AND dag.dag_id IN (?)]
[parameters: ('test-dag',)]
(Background on this error at: http://sqlalche.me/e/13/f405)

````

Anything else we need to know:

invalid mssql-support

Most helpful comment

I would indeed like to use the survey (I think this is the only way) to just query and ask the community what are their preferences - we do not need to submit to the preferences - but I think right now we have very little data to make a conscious decision on all the points you raised (which versions to support, which one to drop). I think asking is no harm :).

All 15 comments

got around it by using

sed -i 's/DagModel.is_paused.is_(True)/DagModel.is_paused == True/g' /home/airflow/.local/lib/python3.7/site-packages/airflow/models/dag.py

Airflow does not support MSSQL officially. Maybe you would like to add tests/support for it in CI @apache/airflow-committers ? WDYT? I think MSSQL comes rather often in the discussions/issues and maybe it's worth to consider official support for it?

sure thing , would love to create a PR to add MSSQL into the CI tests if that's ok

Adding it to the CI would be a first step. Curious what we will encounter. I'm all for it.

Please @aneesh-joseph - let's see how we are doing in MS world. And BTW on Github Actions we can even run tests in Windows environment ;)

I am against this. We can start supporting many databases and maintain many or use/recommend the ones we are very sure about.

As it is MySql has the deadlock issues since a good amount of time, if we start supporting MSSQL we will encounter more issues that we need to solve.

My opinion would be too only officially support Postgres and MySQL. If users want to use other database they can do it, if they are willing to create a PR and fix it we should definitely accept it. But I am definitely against adding more databases to our test suite.

@kaxil - why not try to see how we are doing with the tests first? Maybe that on its own will discourage us from further steps?

Or maybe we find that we have only some limited area to address and maybe we can raise the discussion on devlist and userlist to see how many people are actually already using MSSQL (I have the feeling that people are doing this already - judging from a number of questions and people only reporting rather small issues).

Maybe it's a good opportunity to run a survey about the DB used which we missed last time?

We can tie it with the Summit Attendees and get much more meaningful results. I'd only ask those questions:

  • What is the DB you are using in production?
  • What is the DB you are using in testing/development?
  • Which version?
  • Do you plan migration to different version soon, if yes when and which version?
  • If you are not limited to MySQL/Postgres/Sqlite - is there any other SQL Database you would like Airflow to support and why?

J.

@kaxil - why not try to see how we are doing with the tests first? Maybe that on its own will discourage us from further steps?

Or maybe we find that we have only some limited area to address and maybe we can raise the discussion on devlist and userlist to see how many people are actually already using MSSQL (I have the feeling that people are doing this already - judging from a number of questions and people only reporting rather small issues).

Maybe it's a good opportunity to run a survey about the DB used which we missed last time?

We can tie it with the Summit Attendees and get much more meaningful results. I'd only ask those questions:

  • What is the DB you are using in production?
  • What is the DB you are using in testing/development?
  • Which version?
  • Do you plan migration to different version soon, if yes when and which version?
  • If you are not limited to MySQL/Postgres/Sqlite - is there any other SQL Database you would like Airflow to support and why?

J.

I am happy for you to take the discussion on the dev list :) - My main concern is we should not try to support everything that users ask for. We need to improve and fix the things that we are already using. MySQL support needs to be improved. Our CI suite is already too big, adding more - means maintaining more. The same time and effort could be used on the core functionalities.

And again I am just talking about officially supporting it since SqlAlchemy does support it. If there are any fixes we can do - PRs are welcome, somewhere the line needs to be drawn on what we support. Sqlalchmey is a tool designed for supporting and adding more DB specific features. I would rather we or someone who is interested in specific DB help improve those functionalities in SQLAlchemy as that would help the larger community including Airflow.

PS: This is just what I think and my personal opinion if the larger community and / or committers think otherwise I would be happy to hear the reasoning and for us to add the tests

Bear in mind we already need to test Airflow with newer versions of Postgres and MySQL too

Postgres-version: [9.6, 10]
Mysql-version: [5.7]

Soon we should add support for Postgres 12 (maybe drop 9.6 - separate discussion)

Should we add support for MySQL 8 since it was GA since 19 April 2018

The test suite will keep on increasing.

Same with MSSQL - if we say yes, what versions do we support, should we add support to more versions. The test matrices will keep on increasing.

My recommendation would be to support Postgres and MySQL and improve the support across multiple versions like I mentioned above.

I would indeed like to use the survey (I think this is the only way) to just query and ask the community what are their preferences - we do not need to submit to the preferences - but I think right now we have very little data to make a conscious decision on all the points you raised (which versions to support, which one to drop). I think asking is no harm :).

I have raised a PR to include mssql support with breeze( ./breeze --backend=mssql ) .This should hopefully make it easier to test locally if anyone wants to contribute fixes for mssql compatibility.

Thanks @aneesh-joseph , appreciate it :)

Here's another pull request to address this particular issue with SQL Server:
https://github.com/apache/airflow/pull/10019

Here's another pull request to address this particular issue with SQL Server:

10019

Looks like #9973 already includes my change. Closing as a duplicate.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mik-laj picture mik-laj  路  3Comments

ryanahamilton picture ryanahamilton  路  3Comments

d-lee picture d-lee  路  4Comments

hagope picture hagope  路  4Comments

turbaszek picture turbaszek  路  3Comments