Apache Airflow version: 1.10.11
Kubernetes version (if you are using kubernetes) (use kubectl version): N/A
Environment:
uname -a): N/AWhat happened:
Airflow documentation is missing a step -- preparing the database for Airflow initdb.
This includes creating the "airflow" database, and the "airflow" user.
There are multiple different instructions for it outside the documentation, e.g.:
https://medium.com/@srivathsankr7/apache-airflow-a-practical-guide-5164ff19d18b says:
mysql -u root -p
mysql> CREATE DATABASE airflow CHARACTER SET utf8 COLLATE utf8_unicode_ci;
mysql> create user 'airflow'@'localhost' identified by 'airflow';
mysql> grant all privileges on * . * to 'airflow'@'localhost';
mysql> flush privileges;
http://site.clairvoyantsoft.com/installing-and-configuring-apache-airflow/ says:
CREATE DATABASE airflow CHARACTER SET utf8 COLLATE utf8_unicode_ci;
grant all on airflow.* TO ‘USERNAME'@'%' IDENTIFIED BY ‘{password}';
https://airflow-tutorial.readthedocs.io/en/latest/first-airflow.html says:
MySQL -u root -p
mysql> CREATE DATABASE airflow CHARACTER SET utf8 COLLATE utf8_unicode_ci;
mysql> GRANT ALL PRIVILEGES ON airflow.* To 'airflow'@'localhost';
mysql> FLUSH PRIVILEGES;
(This last one seems to be missing the step of creating the airflow user.)
What you expected to happen:
I would expect https://airflow.apache.org/docs/stable/howto/initialize-database.html to contain complete instructions for preparing the database backend for initialization.
How to reproduce it:
Try to install Airflow with a MySQL backend with no prior knowledge by following the Airflow documentation.
Anything else we need to know:
Airflow rocks!
Thanks for opening your first issue here! Be sure to follow the issue template!
Thanks for your feedback. Wouldn't you like to contribute this documentation change? You can do it even using Github UI.
https://docs.github.com/en/github/managing-files-in-a-repository/editing-files-in-your-repository
I am very happy to help with the review of this change.
Yeah. The best contributions in docs are from those who suffered from lack of it. You can write it in the way that will be usable for other newcomers so I encourage you to contribute it @atsalolikhin-spokeo :)
Hello,
I was unable to commit my change. The pre-commit hook test suite failed on:
Check if image build is needed....................................................................................................Failed
I had a look at your Contributing document, and the bar is a bit high -- Breeze, Docker, virtualenv -- I tried following the instructions but breeze did not start:
[~/git/airflow] fix/10389(+19/-0) ± ./breeze
[~/git/airflow] fix/10389(+19/-0) 1 ± echo $?
1
[~/git/airflow] fix/10389(+19/-0) ±
Here is the patch I was going to send in:
diff --git a/docs/howto/initialize-database.rst b/docs/howto/initialize-database.rst
index aabee9434..b3129286d 100644
--- a/docs/howto/initialize-database.rst
+++ b/docs/howto/initialize-database.rst
@@ -48,12 +48,31 @@ SqlAlchemy backend. We recommend using **MySQL** or **Postgres**.
want to set a default schema for your role with a
command similar to ``ALTER ROLE username SET search_path = airflow, foobar;``
+Setup your database to host Airflow
+-----------------------------------
+
+Create a database called ``airflow`` and a database user that Airflow
+will use to access this database.
+
+Example, for **MySQL**:
+
+.. code-block:: sql
+ CREATE DATABASE airflow CHARACTER SET utf8 COLLATE utf8_unicode_ci;
+ CREATE USER 'airflow' IDENTIFIED BY 'airflow';
+ GRANT ALL PRIVILEGES ON airflow.* TO 'airflow';
+
+Configure Airflow's database connection string
+----------------------------------------------
+
Once you've setup your database to host Airflow, you'll need to alter the
SqlAlchemy connection string located in your configuration file
``$AIRFLOW_HOME/airflow.cfg``. You should then also change the "executor"
setting to use "LocalExecutor", an executor that can parallelize task
instances locally.
+Initialize the database
+-----------------------
+
.. code-block:: bash
# initialize the database
pre-commit is optional. For small changes to the documentation, I very often post the changes without pre-commit to the CI. When an error occurs, I apply the correction locally and send the change again to CI. This is also the reason why I recommended editing the documentation via Github UI.
@atsalolikhin-spokeo Also you can always run git commit --no-verify which will skip pre-commit step.
BTW. I almost never skip pre-commits - they are super helpful and usually when they fail, there is a good reason to fix it .
BTW. The recent changes to pre-commit script will hopefully fix one more reason for a problem, but @atsalolikhin-spokeo - if you rebase to latest version and run it again and it fails, please send me the output - maybe there is another case that I missed :)
I would love those pre-commits to be rock-solid so that people won't skip the pre-commits too often.
@potiuk Thanks for letting me know about git commit --no-verify. I've rebased to the tip of master and tried git commit --verify and it failed on the same step:
Check if image build is needed....................................................................................................Failed
Great idea to get the pre-commit hook rock-solid so that it can run every time.
I would love to hear more about the failure @atsalolikhin-spokeo . Would it be possible that you pass the output that you get wit it? It shoud print more information - usually. If not you can always run pre-commit run build --verbose and pass me the output.
[~/git/airflow] fix/10389 ± pre-commit run build --verbose
Check if image build is needed...........................................Failed
- hook id: build
- duration: 0.05s
- exit code: 1
[~/git/airflow] fix/10389 ±
[~/git/airflow] fix/10389 ± git commit --verify
No-tabs checker...............................................................................................(no files to check)Skipped
Add license for all SQL files.................................................................................(no files to check)Skipped
Add license for all other files...............................................................................(no files to check)Skipped
Add license for all rst files.................................................................................(no files to check)Skipped
Add license for all JS/CSS files..............................................................................(no files to check)Skipped
Add license for all JINJA template files......................................................................(no files to check)Skipped
Add license for all shell files...............................................................................(no files to check)Skipped
Add license for all python files..............................................................................(no files to check)Skipped
Add license for all XML files.................................................................................(no files to check)Skipped
Add license for all yaml files................................................................................(no files to check)Skipped
Add license for all md files..................................................................................(no files to check)Skipped
Add TOC for md files..........................................................................................(no files to check)Skipped
Check hooks apply to the repository...........................................................................(no files to check)Skipped
Check for merge conflicts.....................................................................................(no files to check)Skipped
Debug Statements (Python).....................................................................................(no files to check)Skipped
Check builtin type constructor use............................................................................(no files to check)Skipped
Detect Private Key............................................................................................(no files to check)Skipped
Fix End of Files..............................................................................................(no files to check)Skipped
Mixed line ending.............................................................................................(no files to check)Skipped
Check that executables have shebangs..........................................................................(no files to check)Skipped
Check Xml.....................................................................................................(no files to check)Skipped
Trim Trailing Whitespace......................................................................................(no files to check)Skipped
Fix python encoding pragma....................................................................................(no files to check)Skipped
rst ``code`` is two backticks.................................................................................(no files to check)Skipped
use logger.warning(...........................................................................................(no files to check)Skipped
Check yaml files with yamllint................................................................................(no files to check)Skipped
Run isort to sort imports.....................................................................................(no files to check)Skipped
Run pydocstyle................................................................................................(no files to check)Skipped
Check Shell scripts syntax correctness........................................................................(no files to check)Skipped
Lint OpenAPI using speccy.....................................................................................(no files to check)Skipped
Lint OpenAPI using openapi-spec-validator.....................................................................(no files to check)Skipped
Lint dockerfile...............................................................................................(no files to check)Skipped
Checks for an order of dependencies in setup.py...............................................................(no files to check)Skipped
Update output of breeze command in BREEZE.rst.................................................................(no files to check)Skipped
Update mounts in the local yml file...........................................................................(no files to check)Skipped
Update setup.cfg file with all licenses.......................................................................(no files to check)Skipped
Build cross-dependencies for providers packages...............................................................(no files to check)Skipped
Update extras in documentation................................................................................(no files to check)Skipped
Check for pydevd debug statements accidentally left...........................................................(no files to check)Skipped
Don't use safe in templates...................................................................................(no files to check)Skipped
Check for language that we do not accept as community.........................................................(no files to check)Skipped
Check for inconsitent pylint disable/enable without space.....................................................(no files to check)Skipped
Make sure BaseOperator[Link] is imported from airflow.models.baseoperator in core.............................(no files to check)Skipped
Make sure BaseOperator[Link] is imported from airflow.models outside of core..................................(no files to check)Skipped
To avoid import cycles make sure provide_session and create_session are imported from airflow.utils.session...(no files to check)Skipped
Make sure LoggingMixin is not used alone......................................................................(no files to check)Skipped
Make sure days_ago is imported from airflow.utils.dates.......................................................(no files to check)Skipped
'start_date' should not be defined in default_args in example_dags............................................(no files to check)Skipped
Check if integration list is aligned..........................................................................(no files to check)Skipped
Check if image build is needed....................................................................................................Failed
- hook id: build
- exit code: 1
Check if licenses are OK for Apache...........................................................................(no files to check)Skipped
Checks for consistency between config.yml and default_config.cfg..............................................(no files to check)Skipped
Run mypy......................................................................................................(no files to check)Skipped
Run pylint....................................................................................................(no files to check)Skipped
Run flake8....................................................................................................(no files to check)Skipped
Run BATS bash tests for changed bash files....................................................................(no files to check)Skipped
stylelint.....................................................................................................(no files to check)Skipped
[~/git/airflow] fix/10389 ±
Hey @atsalolikhin-spokeo - please rebase to latest master. I fixed this morning a Mac related problem that would look very much like this: #10440
[~/git/airflow] fix/10389 ± pre-commit run build --verbose
Check if image build is needed...........................................Failed
- hook id: build
- duration: 0.11s
- exit code: 1
You are running pre_commit_ci_build.sh in OSX environment
And you need to install gnu commands
Run 'brew install gnu-getopt coreutils'
Then link the gnu-getopt to become default as suggested by brew.
If you use bash, you should run this command:
echo 'export PATH="/usr/local/opt/gnu-getopt/bin:$PATH"' >> ~/.bash_profile
. ~/.bash_profile
If you use zsh, you should run this command:
echo 'export PATH="/usr/local/opt/gnu-getopt/bin:$PATH"' >> ~/.zprofile
. ~/.zprofile
Login and logout afterwards !!
After re-login, your PATH variable should start with "/usr/local/opt/gnu-getopt/bin"
Your current path is [redacted]
[~/git/airflow] fix/10389 ±
However I already have the brew formulae, but I didn't set up my PATH for gnu-getopt.
[~/git/airflow] fix/10389 ± brew list|grep 'coreutils\|gnu-getopt'
coreutils
gnu-getopt
[~/git/airflow] fix/10389 ± echo $PATH | grep gnu-getopt
[~/git/airflow] fix/10389 ±
Ok, so I set up my PATH:
[~/git/airflow] fix/10389 ± echo $PATH |grep gnu-getopt|cut -d: -f1
/usr/local/opt/gnu-getopt/bin
[~/git/airflow] fix/10389 ±
(And I started a new shell, as per the instructions.)
But the build is still failing:
[~/git/airflow] fix/10389 ± pre-commit run build --verbose
Check if image build is needed...........................................Failed
- hook id: build
- duration: 0.15s
- exit code: 1
You are running pre_commit_ci_build.sh in OSX environment
And you need to install gnu commands
Run 'brew install gnu-getopt coreutils'
Then link the gnu-getopt to become default as suggested by brew.
If you use bash, you should run this command:
...
Are you using zsh ? If so can you run the bash command suggested ? I think inside the pre-commits we always use bash so the message is actually misleading - I can correct it if we confirm it's the case
I am using bash.
On Fri, Aug 21, 2020 at 12:48 PM Jarek Potiuk notifications@github.com
wrote:
Are you using zsh ? If so can you run the bash command suggested ? I think
inside the pre-commits we always use bash so the message is actually
misleading - I can correct it if we confirm it's the case—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/apache/airflow/issues/10389#issuecomment-678462371,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AP44PKTZGIZCI6XX756BDR3SB3FQNANCNFSM4QEGNQUA
.
--
Aleksey Tsalolikhin
Data Quality Team (DevOps Specialist)
LinkedIn https://www.linkedin.com/company/spokeo/ • Instagram
https://www.instagram.com/spokeo/ • Youtube https://bit.ly/2oh8YPv
Did you log out/log in as suggested by the message
? Bash cashes the binaries it uses so even if your PATH is updated you might still use the old getopt. I believe there is a command to clean it up 'hash -r` but logging out/in is easier
Yes, I did. I use GNU Screen, so I closed that window, and started another
window (which starts a fresh bash process).
On Fri, Aug 21, 2020 at 1:41 PM Jarek Potiuk notifications@github.com
wrote:
Did you log out/log in as suggested by the message
? Bash cashes the binaries it uses so even if your PATH is updated you
might still use the old getopt. I believe there is a command to clean it up
'hash -r` but logging out/in is easier—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/apache/airflow/issues/10389#issuecomment-678484017,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AP44PKTFAY5QSEXOF45X4SLSB3LYTANCNFSM4QEGNQUA
.
--
Aleksey Tsalolikhin
Data Quality Team (DevOps Specialist)
LinkedIn https://www.linkedin.com/company/spokeo/ • Instagram
https://www.instagram.com/spokeo/ • Youtube https://bit.ly/2oh8YPv
Very interesting - what do you see with command -v getopt, getopt --version and what is the exit code of getopt -T ?
And lastly, what prefix you get when you run brew --prefix getopt ?
I really want to get to the bottom of it, to see if we can improve detection of correct getopt. We might want to force using /usr/local/opt/gnu-getopt/bin on Mac so that it is 100% sure that the right one is being used.
Or maybe the gnu-getopt is installed (for some reason) in a different folder ? Maybe you simply have a brew installing those tools in a different directory.
One more thing - I am not sure where bash actually keeps the hash table of binaries that can be found on the path - it might be kept and refreshed at ".bash_profile" sourcing time rather than at ".bashrc" sourcing time - so it is quite likely that your bash command will still remember the "getopt" to be in the old location if you use screen, exit and re-enter.
You can see more about "rehash" or "hash -r" commands here:
https://docs.oracle.com/cd/E19683-01/816-0210/6m6nb7mj6/index.html
https://superuser.com/questions/490983/how-to-rehash-executables-in-path-with-bash
It's likely that the hash table of binaries still points to the old location rather than to the new one. But this is easy to check - simply runhash -r and the hash is refreshed.
[~/git/airflow] fix/10389 ±
[~/git/airflow] fix/10389 ± command -v getopt
/usr/bin/getopt
[~/git/airflow] fix/10389 ± getopt --version
--
[~/git/airflow] fix/10389 ± getopt -T
--
[~/git/airflow] fix/10389 ± echo $?
0
[~/git/airflow] fix/10389 ± brew --prefix getopt
Error: No available formula with the name "getopt"
[~/git/airflow] fix/10389 ± hash -r
[~/git/airflow] fix/10389 ± command -v getopt
/usr/bin/getopt
[~/git/airflow] fix/10389 ±
I opened a new shell:
[~] $ command -v getopt
/usr/bin/getopt
[~] $
Okay, that's the macOS getopt; so where is brew's getopt?
[~] $ find . -name getopt 2>/dev/null
./git/homebrew/Cellar/gnu-getopt/2.36/bin/getopt
./git/homebrew/Cellar/gnu-getopt/2.36/etc/bash_completion.d/getopt
./git/homebrew/Cellar/libusb/1.0.23/share/libusb/examples/getopt
[~] $ ./git/homebrew/Cellar/gnu-getopt/2.36/bin/getopt --version
getopt from util-linux 2.36
[~] $
So I've added it to my PATH.
[~/git/airflow] fix/10389 ± type getopt
getopt is /Users/atsalolikhin/git/homebrew/Cellar/gnu-getopt/2.36/bin/getopt
[~/git/airflow] fix/10389 ±
Now the pre-commit hook works:
[~/git/airflow] fix/10389 ± pre-commit run build --verbose
Check if image build is needed...........................................
Please confirm rebuild image CI-python3.6. Are you sure? [y/N/q]
y
The answer is 'yes'. rebuild image CI-python3.6. This can take some time!
Preparing apache/airflow:master-python3.6-ci.
Docker pulling python:3.6-slim-buster.
Build log: /var/folders/5m/1v1dg76s7zbdpwj39dh5skv80000gp/T/tmp.BExKt31d/out.log
/
Preparing apache/airflow:master-python3.6-ci.
Docker building apache/airflow:master-python3.6-ci.
Build log: /var/folders/5m/1v1dg76s7zbdpwj39dh5skv80000gp/T/tmp.BExKt31d/out.log
Step 96/100 :\
Passed
- hook id: build
- duration: 66.95s
Some of your images need to be rebuild because important files (like package list) has changed.
You have those options:
* Rebuild the images now by answering 'y' (this might take some time!)
* Skip rebuilding the images and hope changes are not big (you will be asked again)
* Quit and manually rebuild the images using one of the following commands
* ./breeze build-image
* ./breeze build-image --force-pull-images
The first command works incrementally from your last local build.
The second command you use if you want to completely refresh your images from dockerhub.
Force pull base image python:3.6-slim-buster
Pulling the image apache/airflow:master-python3.6-ci
[~/git/airflow] fix/10389 ±
It's too bad the brew install didn't automagically make the gnu-getopt take precedence over the system getopt. I understand the need for user awareness/choice in that. I must have not read the prompt to take the manual step to adjust the path. Anyway it's done now. Thanks!