Follow-up to the somewhat misdirected #3572
Hard to see how we never noticed this before, but in debug mode our set -x redirection results in a lot of errors in job logs.
To reproduce, any suite will do, e.g.:
[scheduling]
[[graph]]
R1 = foo
[runtime]
[[foo]]
script = true
Run it in debug mode:
cylc run --debug foo
````
Now check the task job log:
(venv) [oliverh@niwa-1007885 cylc-flow]$ cylc cat-log -f e foo foo.1
Sending DEBUG MODE xtrace to job.xtrace
/bin/sh: BASH_XTRACEFD: 19: invalid value for trace file descriptor # <----------
/bin/sh: BASH_XTRACEFD: 19: invalid value for trace file descriptor # <----------
2020-04-30T13:52:45+12:00 DEBUG - Loading site/user config files
2020-04-30T13:52:45+12:00 DEBUG - Reading file /home/oliverh/.cylc/flow/8.0a1/flow.rc
2020-04-30T13:52:45+12:00 DEBUG - zmq:send {'command': 'put_messages', 'args': {'task_job': '1/foo/01', 'event_time': '2020-04-30T13:52:45+12:00', 'messages': [['INFO', 'started']]}, 'meta': {'prog': 'cylc-message', 'host': 'niwa-1007885.niwa.local'}}
2020-04-30T13:52:45+12:00 DEBUG - zmq:recv {'data': [True, 'Messages queued: 1'], 'user': 'oliverh'}
/bin/sh: BASH_XTRACEFD: 19: invalid value for trace file descriptor # <----------
/bin/sh: BASH_XTRACEFD: 19: invalid value for trace file descriptor # <----------
2020-04-30T13:52:46+12:00 DEBUG - Loading site/user config files
2020-04-30T13:52:46+12:00 DEBUG - Reading file /home/oliverh/.cylc/flow/8.0a1/flow.rc
2020-04-30T13:52:46+12:00 DEBUG - zmq:send {'command': 'put_messages', 'args': {'task_job': '1/foo/01', 'event_time': '2020-04-30T13:52:46+12:00', 'messages': [['INFO', 'succeeded']]}, 'meta': {'prog': 'cylc-message', 'host': 'niwa-1007885.niwa.local'}}
2020-04-30T13:52:46+12:00 DEBUG - zmq:recv {'data': [True, 'Messages queued: 1'], 'user': 'oliverh'}
```
You can also reproduce the problem by running the task job script manually, without Cylc.
After a bit of investigation, these warnings:
job.xtrace!BASH_XTRACEFD is pointed at itcylc message is called in the job scriptSo, this seems to be something to do with how $BASH_XTRACEFD and/or the FD itself is handled in sub-shells, OR some incompatibility with how output streams or FDs are manipulated in the (Python) cylc message program??
A brute force fix (tested at bash-4.2) is to unset BASH_XTRACEFD before cylc message calls, and then re-open the FD and reset BASH_XTRACEFD after. But it would be nice to understand exactly what the problem is.
I had a container running the image that install multiple versions of bash, so quickly grabbed 4.1. (This is a Ubuntu 18.04 image).
root@942b2e7e6fb1:~/cylc-run/foo# bash -version
bash -version
GNU bash, version 4.1.0(1)-release (x86_64-unknown-linux-gnu)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
root@942b2e7e6fb1:~/cylc-run/foo# echo ${BASH_XTRACEFD}
echo ${BASH_XTRACEFD}
(Maybe that BASH_XTRACEFD is set in a script, not in the interactive mode? Never heard about this variable till today :grimacing: )
Used your example, with same name foo. But I modified the script of the foo task.
[runtime]
[[foo]]
script = echo "Hello world! Bash says: $(bash -version | head -n 1 | grep -oP ' ([0-9.])+')"
root@942b2e7e6fb1:~/cylc-run/foo# cylc run --debug foo
cylc run --debug foo
2020-04-30T03:36:49Z DEBUG - Loading site/user config files
2020-04-30T03:36:49Z DEBUG - Extracting etc/job.sh to /root/cylc-run/foo/.service/etc/job.sh
(...)
Then checking the logs:
# Error
root@942b2e7e6fb1:~/cylc-run/foo# cylc cat-log -f e foo foo.1
cylc cat-log -f e foo foo.1
mesg: ttyname failed: Inappropriate ioctl for device
Sending DEBUG MODE xtrace to job.xtrace
2020-04-30T03:36:51Z DEBUG - Loading site/user config files
2020-04-30T03:36:51Z DEBUG - zmq:send {'command': 'put_messages', 'args': {'task_job': '1/foo/01', 'event_time': '2020-04-30T03:36:51Z', 'messages': [['INFO', 'started']]}, 'meta': {'prog': 'cylc-message', 'host': '942b2e7e6fb1'}}
2020-04-30T03:36:51Z DEBUG - zmq:recv {'data': [True, 'Messages queued: 1'], 'user': 'root'}
2020-04-30T03:36:52Z DEBUG - Loading site/user config files
2020-04-30T03:36:52Z DEBUG - zmq:send {'command': 'put_messages', 'args': {'task_job': '1/foo/01', 'event_time': '2020-04-30T03:36:52Z', 'messages': [['INFO', 'succeeded']]}, 'meta': {'prog': 'cylc-message', 'host': '942b2e7e6fb1'}}
2020-04-30T03:36:52Z DEBUG - zmq:recv {'data': [True, 'Messages queued: 1'], 'user': 'root'}
# Out
root@942b2e7e6fb1:~/cylc-run/foo# cylc cat-log -f o foo foo.1
cylc cat-log -f o foo foo.1
Suite : foo
Task Job : 1/foo/01 (try 1)
User@Host: root@942b2e7e6fb1
Hello world! Bash says: 4.1.0
2020-04-30T03:36:51Z INFO - started
2020-04-30T03:36:52Z INFO - succeeded
Then trying with 4.2:
root@942b2e7e6fb1:~/cylc-run/foo# update-alternatives --set bash /bash/bash-4.2
update-alternatives --set bash /bash/bash-4.2
update-alternatives: using /bash/bash-4.2 to provide /bin/bash (bash) in manual mode
root@942b2e7e6fb1:~/cylc-run/foo# cylc run --debug foo
cylc run --debug foo
2020-04-30T03:39:45Z DEBUG - Loading site/user config files
2020-04-30T03:39:45Z DEBUG - Extracting etc/job.sh to /root/cylc-run/foo/.service/etc/job.sh
(...)
root@942b2e7e6fb1:~/cylc-run/foo# cylc cat-log -f e foo foo.1
cylc cat-log -f e foo foo.1
mesg: ttyname failed: Inappropriate ioctl for device
Sending DEBUG MODE xtrace to job.xtrace
2020-04-30T03:39:47Z DEBUG - Loading site/user config files
2020-04-30T03:39:47Z DEBUG - zmq:send {'command': 'put_messages', 'args': {'task_job': '1/foo/01', 'event_time': '2020-04-30T03:39:47Z', 'messages': [['INFO', 'started']]}, 'meta': {'prog': 'cylc-message', 'host': '942b2e7e6fb1'}}
2020-04-30T03:39:47Z DEBUG - zmq:recv {'data': [True, 'Messages queued: 1'], 'user': 'root'}
2020-04-30T03:39:48Z DEBUG - Loading site/user config files
2020-04-30T03:39:48Z DEBUG - zmq:send {'command': 'put_messages', 'args': {'task_job': '1/foo/01', 'event_time': '2020-04-30T03:39:48Z', 'messages': [['INFO', 'succeeded']]}, 'meta': {'prog': 'cylc-message', 'host': '942b2e7e6fb1'}}
2020-04-30T03:39:48Z DEBUG - zmq:recv {'data': [True, 'Messages queued: 1'], 'user': 'root'}
root@942b2e7e6fb1:~/cylc-run/foo# cylc cat-log -f o foo foo.1
cylc cat-log -f o foo foo.1
Suite : foo
Task Job : 1/foo/01 (try 1)
User@Host: root@942b2e7e6fb1
Hello world! Bash says: 4.2.0
2020-04-30T03:39:47Z INFO - started
2020-04-30T03:39:48Z INFO - succeeded
Previously, I was also not able to get that warning about invalid file descriptor while testing the PR with my bash 4.4 but on my Ubuntu laptop. It could be something with the version from RHEL or Centos, in case you are also using a RHEL/Centos to test it?
Interestingly I found another bug that will raise in a few minutes with cat-log :grimacing:
Huh. I'm using bash-4.2 on Centos 7.
Can you also check that the xtrace was directed to job.xtrace? (Use cylc cat-log -f x foo foo.1).
I had done a grep -r -H "invalid value for trace" . in the cylc-run/foo directory but it found nothing.
But as the container is still running and 4.2 is still the bash version, it's quite easy to test (though I will confess I rarely look at the xtrace file as haven't had/learned to use that log):
root@942b2e7e6fb1:~/cylc-run/foo# cylc cat-log -f x foo foo.1
cylc cat-log -f x foo foo.1
+[20200430T040500+0000]root@942b2e7e6fb1 cylc__job__run_inst_func global_init_script
+[20200430T040500+0000]root@942b2e7e6fb1 typeset func_name=global_init_script
+[20200430T040500+0000]root@942b2e7e6fb1 shift 1
+[20200430T040500+0000]root@942b2e7e6fb1 typeset -f cylc__job__inst__global_init_script
+[20200430T040500+0000]root@942b2e7e6fb1 cylc__job__run_inst_func init_script
+[20200430T040500+0000]root@942b2e7e6fb1 typeset func_name=init_script
+[20200430T040500+0000]root@942b2e7e6fb1 shift 1
+[20200430T040500+0000]root@942b2e7e6fb1 typeset -f cylc__job__inst__init_script
+[20200430T040500+0000]root@942b2e7e6fb1 typeset signal_name=
+[20200430T040500+0000]root@942b2e7e6fb1 for signal_name in '${CYLC_FAIL_SIGNALS}'
+[20200430T040500+0000]root@942b2e7e6fb1 trap 'cylc__job_err EXIT' EXIT
+[20200430T040500+0000]root@942b2e7e6fb1 for signal_name in '${CYLC_FAIL_SIGNALS}'
+[20200430T040500+0000]root@942b2e7e6fb1 trap 'cylc__job_err ERR' ERR
+[20200430T040500+0000]root@942b2e7e6fb1 for signal_name in '${CYLC_FAIL_SIGNALS}'
+[20200430T040500+0000]root@942b2e7e6fb1 trap 'cylc__job_err TERM' TERM
+[20200430T040500+0000]root@942b2e7e6fb1 for signal_name in '${CYLC_FAIL_SIGNALS}'
+[20200430T040500+0000]root@942b2e7e6fb1 trap 'cylc__job_err XCPU' XCPU
+[20200430T040500+0000]root@942b2e7e6fb1 set -euo pipefail
+[20200430T040500+0000]root@942b2e7e6fb1 USER=root
+[20200430T040500+0000]root@942b2e7e6fb1 typeset host=942b2e7e6fb1
+[20200430T040500+0000]root@942b2e7e6fb1 [[ -z 942b2e7e6fb1 ]]
+[20200430T040500+0000]root@942b2e7e6fb1 echo 'Suite : foo'
+[20200430T040500+0000]root@942b2e7e6fb1 echo 'Task Job : 1/foo/01 (try 1)'
+[20200430T040500+0000]root@942b2e7e6fb1 echo 'User@Host: root@942b2e7e6fb1'
+[20200430T040500+0000]root@942b2e7e6fb1 echo
+[20200430T040500+0000]root@942b2e7e6fb1 export CYLC_SUITE_LOG_DIR=/root/cylc-run/foo/log/suite
+[20200430T040500+0000]root@942b2e7e6fb1 CYLC_SUITE_LOG_DIR=/root/cylc-run/foo/log/suite
+[20200430T040500+0000]root@942b2e7e6fb1 CYLC_SUITE_WORK_DIR_ROOT=/root/cylc-run/foo
+[20200430T040500+0000]root@942b2e7e6fb1 export CYLC_SUITE_SHARE_DIR=/root/cylc-run/foo/share
+[20200430T040500+0000]root@942b2e7e6fb1 CYLC_SUITE_SHARE_DIR=/root/cylc-run/foo/share
+[20200430T040500+0000]root@942b2e7e6fb1 export CYLC_SUITE_WORK_DIR=/root/cylc-run/foo/work
+[20200430T040500+0000]root@942b2e7e6fb1 CYLC_SUITE_WORK_DIR=/root/cylc-run/foo/work
++[20200430T040500+0000]root@942b2e7e6fb1 cut -d / -f 1
+[20200430T040500+0000]root@942b2e7e6fb1 CYLC_TASK_CYCLE_POINT=1
++[20200430T040500+0000]root@942b2e7e6fb1 cut -d / -f 2
+[20200430T040500+0000]root@942b2e7e6fb1 CYLC_TASK_NAME=foo
+[20200430T040500+0000]root@942b2e7e6fb1 export CYLC_TASK_NAME CYLC_TASK_CYCLE_POINT ISODATETIMEREF
+[20200430T040500+0000]root@942b2e7e6fb1 [[ integer != \i\n\t\e\g\e\r ]]
++[20200430T040500+0000]root@942b2e7e6fb1 cut -d / -f 3
+[20200430T040500+0000]root@942b2e7e6fb1 export CYLC_TASK_SUBMIT_NUMBER=1
+[20200430T040500+0000]root@942b2e7e6fb1 CYLC_TASK_SUBMIT_NUMBER=1
+[20200430T040500+0000]root@942b2e7e6fb1 export CYLC_TASK_ID=foo.1
+[20200430T040500+0000]root@942b2e7e6fb1 CYLC_TASK_ID=foo.1
+[20200430T040500+0000]root@942b2e7e6fb1 export CYLC_TASK_LOG_DIR=/root/cylc-run/foo/log/job/1/foo/01
+[20200430T040500+0000]root@942b2e7e6fb1 CYLC_TASK_LOG_DIR=/root/cylc-run/foo/log/job/1/foo/01
+[20200430T040500+0000]root@942b2e7e6fb1 export CYLC_TASK_LOG_ROOT=/root/cylc-run/foo/log/job/1/foo/01/job
+[20200430T040500+0000]root@942b2e7e6fb1 CYLC_TASK_LOG_ROOT=/root/cylc-run/foo/log/job/1/foo/01/job
+[20200430T040500+0000]root@942b2e7e6fb1 [[ -n '' ]]
+[20200430T040500+0000]root@942b2e7e6fb1 CYLC_TASK_WORK_DIR_BASE=1/foo
+[20200430T040500+0000]root@942b2e7e6fb1 export CYLC_TASK_WORK_DIR=/root/cylc-run/foo/work/1/foo
+[20200430T040500+0000]root@942b2e7e6fb1 CYLC_TASK_WORK_DIR=/root/cylc-run/foo/work/1/foo
+[20200430T040500+0000]root@942b2e7e6fb1 typeset contact=/root/cylc-run/foo/.service/contact
+[20200430T040500+0000]root@942b2e7e6fb1 [[ -f /root/cylc-run/foo/.service/contact ]]
++[20200430T040500+0000]root@942b2e7e6fb1 sed -n 's/^CYLC_SUITE_HOST=//p' /root/cylc-run/foo/.service/contact
+[20200430T040500+0000]root@942b2e7e6fb1 CYLC_SUITE_HOST=942b2e7e6fb1
++[20200430T040500+0000]root@942b2e7e6fb1 sed -n 's/^CYLC_SUITE_OWNER=//p' /root/cylc-run/foo/.service/contact
+[20200430T040500+0000]root@942b2e7e6fb1 CYLC_SUITE_OWNER=root
+[20200430T040500+0000]root@942b2e7e6fb1 export CYLC_SUITE_HOST CYLC_SUITE_OWNER
+[20200430T040500+0000]root@942b2e7e6fb1 export CYLC_SUITE_SHARE_PATH=/root/cylc-run/foo/share
+[20200430T040500+0000]root@942b2e7e6fb1 CYLC_SUITE_SHARE_PATH=/root/cylc-run/foo/share
+[20200430T040500+0000]root@942b2e7e6fb1 export CYLC_SUITE_INITIAL_CYCLE_TIME=1
+[20200430T040500+0000]root@942b2e7e6fb1 CYLC_SUITE_INITIAL_CYCLE_TIME=1
+[20200430T040500+0000]root@942b2e7e6fb1 export CYLC_SUITE_FINAL_CYCLE_TIME=1
+[20200430T040500+0000]root@942b2e7e6fb1 CYLC_SUITE_FINAL_CYCLE_TIME=1
+[20200430T040500+0000]root@942b2e7e6fb1 export CYLC_TASK_CYCLE_TIME=1
+[20200430T040500+0000]root@942b2e7e6fb1 CYLC_TASK_CYCLE_TIME=1
+[20200430T040500+0000]root@942b2e7e6fb1 export CYLC_TASK_WORK_PATH=/root/cylc-run/foo/work/1/foo
+[20200430T040500+0000]root@942b2e7e6fb1 CYLC_TASK_WORK_PATH=/root/cylc-run/foo/work/1/foo
+[20200430T040500+0000]root@942b2e7e6fb1 cylc__job__run_inst_func env_script
+[20200430T040500+0000]root@942b2e7e6fb1 typeset func_name=env_script
+[20200430T040500+0000]root@942b2e7e6fb1 shift 1
+[20200430T040500+0000]root@942b2e7e6fb1 typeset -f cylc__job__inst__env_script
+[20200430T040500+0000]root@942b2e7e6fb1 CYLC_TASK_MESSAGE_STARTED_PID=12563
+[20200430T040500+0000]root@942b2e7e6fb1 export PATH=/root/cylc-run/foo/bin:/root/cylc-run/foo/bin:/root/cylc-run/foo:/root/cylc-run/foo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+[20200430T040500+0000]root@942b2e7e6fb1 PATH=/root/cylc-run/foo/bin:/root/cylc-run/foo/bin:/root/cylc-run/foo:/root/cylc-run/foo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+[20200430T040500+0000]root@942b2e7e6fb1 cylc message -- foo 1/foo/01 started
+[20200430T040500+0000]root@942b2e7e6fb1 export PYTHONPATH=/root/cylc-run/foo/lib/python:/root/cylc-run/foo/lib/python:
+[20200430T040500+0000]root@942b2e7e6fb1 PYTHONPATH=/root/cylc-run/foo/lib/python:/root/cylc-run/foo/lib/python:
+[20200430T040500+0000]root@942b2e7e6fb1 mkdir -p /root/cylc-run/foo/share
++[20200430T040500+0000]root@942b2e7e6fb1 dirname /root/cylc-run/foo/work/1/foo
+[20200430T040500+0000]root@942b2e7e6fb1 mkdir -p /root/cylc-run/foo/work/1
+[20200430T040500+0000]root@942b2e7e6fb1 mkdir -p /root/cylc-run/foo/work/1/foo
+[20200430T040500+0000]root@942b2e7e6fb1 cd /root/cylc-run/foo/work/1/foo
+[20200430T040500+0000]root@942b2e7e6fb1 typeset func_name=
+[20200430T040500+0000]root@942b2e7e6fb1 for func_name in ''\''user_env'\''' ''\''pre_script'\''' ''\''script'\''' ''\''post_script'\'''
+[20200430T040500+0000]root@942b2e7e6fb1 cylc__job__run_inst_func user_env
+[20200430T040500+0000]root@942b2e7e6fb1 typeset func_name=user_env
+[20200430T040500+0000]root@942b2e7e6fb1 shift 1
+[20200430T040500+0000]root@942b2e7e6fb1 typeset -f cylc__job__inst__user_env
+[20200430T040500+0000]root@942b2e7e6fb1 for func_name in ''\''user_env'\''' ''\''pre_script'\''' ''\''script'\''' ''\''post_script'\'''
+[20200430T040500+0000]root@942b2e7e6fb1 cylc__job__run_inst_func pre_script
+[20200430T040500+0000]root@942b2e7e6fb1 typeset func_name=pre_script
+[20200430T040500+0000]root@942b2e7e6fb1 shift 1
+[20200430T040500+0000]root@942b2e7e6fb1 typeset -f cylc__job__inst__pre_script
+[20200430T040500+0000]root@942b2e7e6fb1 for func_name in ''\''user_env'\''' ''\''pre_script'\''' ''\''script'\''' ''\''post_script'\'''
+[20200430T040500+0000]root@942b2e7e6fb1 cylc__job__run_inst_func script
+[20200430T040500+0000]root@942b2e7e6fb1 typeset func_name=script
+[20200430T040500+0000]root@942b2e7e6fb1 shift 1
+[20200430T040500+0000]root@942b2e7e6fb1 typeset -f cylc__job__inst__script
+[20200430T040500+0000]root@942b2e7e6fb1 cylc__job__inst__script
++[20200430T040500+0000]root@942b2e7e6fb1 bash -version
++[20200430T040500+0000]root@942b2e7e6fb1 head -n 1
++[20200430T040500+0000]root@942b2e7e6fb1 grep -oP ' ([0-9.])+'
+[20200430T040500+0000]root@942b2e7e6fb1 echo 'Hello world! Bash says: 4.2.0'
+[20200430T040500+0000]root@942b2e7e6fb1 for func_name in ''\''user_env'\''' ''\''pre_script'\''' ''\''script'\''' ''\''post_script'\'''
+[20200430T040500+0000]root@942b2e7e6fb1 cylc__job__run_inst_func post_script
+[20200430T040500+0000]root@942b2e7e6fb1 typeset func_name=post_script
+[20200430T040500+0000]root@942b2e7e6fb1 shift 1
+[20200430T040500+0000]root@942b2e7e6fb1 typeset -f cylc__job__inst__post_script
+[20200430T040500+0000]root@942b2e7e6fb1 cd
+[20200430T040500+0000]root@942b2e7e6fb1 rmdir /root/cylc-run/foo/work/1/foo
+[20200430T040500+0000]root@942b2e7e6fb1 wait 12563
+[20200430T040500+0000]root@942b2e7e6fb1 cylc message -- foo 1/foo/01 succeeded
+[20200430T040501+0000]root@942b2e7e6fb1 trap '' EXIT ERR TERM XCPU
+[20200430T040501+0000]root@942b2e7e6fb1 cylc__job__run_inst_func exit_script
+[20200430T040501+0000]root@942b2e7e6fb1 typeset func_name=exit_script
+[20200430T040501+0000]root@942b2e7e6fb1 shift 1
+[20200430T040501+0000]root@942b2e7e6fb1 typeset -f cylc__job__inst__exit_script
+[20200430T040501+0000]root@942b2e7e6fb1 exit 0
haven't had/learned to use that log
It's for primitive shell script debugging. set -x makes bash print out every line of code before executing it - normally to stderr, but it's so verbose we try to redirect to another file.
set -x
is that why it's called xtrace? Because it contains trace information that is enabled with the x flag? :open_mouth:
I think so!