In #2271, @garlick discovered the shell can be very slow on tests where a mountain of output is output from the job.
Using perf record he saw:
98.69% flux-shell libflux-core.so.1.0.0 [.] cbuf_find_unread_line
0.06% flux-shell [nvidia] [k] _nv031488rm
0.03% flux-shell [kernel.kallsyms] [k] syscall_return_via_sysret
0.03% flux-shell libc-2.27.so [.] _int_malloc
0.02% flux-shell [kernel.kallsyms] [k] _raw_spin_lock_irqsave
0.02% flux-shell libc-2.27.so [.] cfree@GLIBC_2.2.5
0.02% flux-shell [unknown] [k] 0xfffffe000010e01b
0.02% flux-shell libc-2.27.so [.] __libc_calloc
cbuf_find_unread_line() is from the cbuf library.
I noticed that cbuf_lines_used() calls this function to determine the number of lines available in the buffer. cbuf_lines_used() is called from flux_buffer_lines(). flux_buffer_lines() is regularly called in both libsubprocess and ev_buffer_read to generate callbacks when lines are available.
I suspect the issue is the buffers are filling up with ALOT of lines, and cbuf_find_unread_line() is slowly iterating through all the contents in the buffer char by char.
What would probably speed this up a lot is if instead of retrieving the number of lines available, if some function were available to simply say if atleast one line is available. For generating callbacks, that is all that is needed.
if some function were available to simply say if atleast one line is available
I think a flux_buffer_has_line() could be implemented something like:
bool flux_buffer_has_line (flux_buffer_t *fb)
{
char buf[1];
return (cbuf_peek_line (fb->cbuf, buf, 0, 1) > 0);
}
There is also some line counting going on down in libflux/buffer.c:check_read_cb():
It probably promotes "fairness" in the reactor if the buffer read callback is only called once per reactor loop, instead of calling it until the callback stops consuming lines. At least, leave it up to the callback to iteratively call read_line until all lines are consumed.
This would have the added benefit that the available lines comparison would no longer be needed and only flux_buffer_has_line() would be required...
E.g. instead of:
else if (fb->cb_type == FLUX_BUFFER_CB_TYPE_READ_LINE
&& flux_buffer_lines (fb) > 0) {
int count = flux_buffer_lines (fb);
/* we will iterate over all lines, but only if the user is
* reading them. If the user isn't reading lines, we're not
* going to infinitely loop
*/
while (fb->cb_type == FLUX_BUFFER_CB_TYPE_READ_LINE
&& count > 0) {
int tmp;
fb->cb (fb, fb->cb_arg);
if ((tmp = flux_buffer_lines (fb)) < 0)
break;
if (tmp < count)
count = tmp;
else
break;
}
}
You would have
else if (fb->cb_type == FLUX_BUFFER_CB_TYPE_READ_LINE
&& flux_buffer_has_line (fb) > 0)
fb->cb (fb, fb->cb_arg);
Of course, this is a change to non-reactor base uses of flux_buffer_t, so those instances will need to be updated to move the read/read_line loop into the callback, instead of expecting the callback to called until there is no more data to read. (not sure if this is used outside of unit tests.)
As a test I implemented flux_buffer_has_line() along with the change to check_read_cb above and results look promising:
before:
$ time flux srun t/shell/lptest 79 10000 | wc -l
10000
real 1m26.405s
user 0m0.253s
sys 0m0.075s
after:
$ time flux srun t/shell/lptest 79 10000 | wc -l
10000
real 0m2.525s
user 0m0.279s
sys 0m0.055s
It probably promotes "fairness" in the reactor if the buffer read callback is only called once per reactor loop, instead of calling it until the callback stops consuming lines. At least, leave it up to the callback to iteratively call read_line until all lines are consumed.
Agreed. I began thinking about that when I noticed how @garlick's use of flux_subprocess_read_line() in the shell.
Sounds promising!
On Tue, Aug 6, 2019, 8:43 PM Al Chu notifications@github.com wrote:
It probably promotes "fairness" in the reactor if the buffer read callback
is only called once per reactor loop, instead of calling it until the
callback stops consuming lines. At least, leave it up to the callback to
iteratively call read_line until all lines are consumed.Agreed. I began thinking about that when I noticed how @garlick
https://github.com/garlick's use of flux_subprocess_read_line() in the
shell.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/flux-framework/flux-core/issues/2287?email_source=notifications&email_token=AABJPW6NL4O372CTENX3S6DQDJAHFA5CNFSM4IJ3KSX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3XDI3A#issuecomment-518927468,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AABJPW267AET5BGD33GMU4LQDJAHFANCNFSM4IJ3KSXQ
.
At least, leave it up to the callback to iteratively call read_line until all lines are consumed.
I think the fairness comment is spot on, however it should be noted that one cannot call read_line until all lines are consumed in the callback without risk of blocking (unless I am missing something).
Hm, maybe I'm not looking at the same code, but flux_buffer_read_line just calls cbuf_read_line which doesn't block afaik. (I'm not even sure how it would block?)
I'm probably just looking at the wrong spot.
BTW, I suppose it should be obvious but if any of these IO buffering calls ever block it would be a deadlock, so if there is a potential to block we should fix it I suppose.
@garlick i'm having trouble groking your last comment. "cannot call read_line until all lines are consumed"?
@chu11, I could finish up my addition of flux_buffer_has_line() and propose a PR if you'd like (already part way done anyway)
"cannot call read_line until all lines are consumed"?
Sorry, that was unclear. I meant cannot loop in the callback, calling read_line with "all lines comsumed" as the loop break.
Sounds like I was mistaken anyway! Sorry for the noise.
@chu11, I could finish up my addition of flux_buffer_has_line() and propose a PR if you'd like (already part way done anyway)
Awesome if you're far along. I think the only major gotcha will be tests which may be assuming the user will get a callback for each line in the buffer.
only major gotcha will be tests which may be assuming the user will get a callback for each line in the buffer.
Yeah, haven't decided what to do there yet...
Yeah, haven't decided what to do there yet...
@chu11, actually would it be ok to just remove the multiple_callback() test completely? I was going to change those callbacks to read from the buffer in a loop, but maybe if multiple callbacks aren't a feature of flux_buffer_t anymore we should just remove the tests?
For posterity, I'm getting the following numbers running an lptest job with 79 chars per line and a variable number of lines. The 1X column is a single lptest task. The 10X column is 10 lptest tasks, running on 10 separate shells, each producing the specified number of lines (so 10X the output).
| lines | 1X | 10X |
| ------- | ---- | --- |
| 1K | 0.5s | 1.5s |
| 10K | 2.5s | 12.9s |
| 100K | 20.3s | 118s |
| 1M | 204s | - |
This is with no flow control.
Much better!
Most helpful comment
For posterity, I'm getting the following numbers running an lptest job with 79 chars per line and a variable number of lines. The 1X column is a single lptest task. The 10X column is 10 lptest tasks, running on 10 separate shells, each producing the specified number of lines (so 10X the output).
| lines | 1X | 10X |
| ------- | ---- | --- |
| 1K | 0.5s | 1.5s |
| 10K | 2.5s | 12.9s |
| 100K | 20.3s | 118s |
| 1M | 204s | - |
This is with no flow control.
Much better!