Flux-core: buffer: flux_buffer_lines() can be very slow

Created on 7 Aug 2019 · 13Comments · Source: flux-framework/flux-core

In #2271, @garlick discovered the shell can be very slow on tests where a mountain of output is output from the job.

Using perf record he saw:

  98.69%  flux-shell       libflux-core.so.1.0.0  [.] cbuf_find_unread_line
   0.06%  flux-shell       [nvidia]               [k] _nv031488rm
   0.03%  flux-shell       [kernel.kallsyms]      [k] syscall_return_via_sysret
   0.03%  flux-shell       libc-2.27.so           [.] _int_malloc
   0.02%  flux-shell       [kernel.kallsyms]      [k] _raw_spin_lock_irqsave
   0.02%  flux-shell       libc-2.27.so           [.] cfree@GLIBC_2.2.5
   0.02%  flux-shell       [unknown]              [k] 0xfffffe000010e01b
   0.02%  flux-shell       libc-2.27.so           [.] __libc_calloc

cbuf_find_unread_line() is from the cbuf library.

I noticed that cbuf_lines_used() calls this function to determine the number of lines available in the buffer. cbuf_lines_used() is called from flux_buffer_lines(). flux_buffer_lines() is regularly called in both libsubprocess and ev_buffer_read to generate callbacks when lines are available.

I suspect the issue is the buffers are filling up with ALOT of lines, and cbuf_find_unread_line() is slowly iterating through all the contents in the buffer char by char.

What would probably speed this up a lot is if instead of retrieving the number of lines available, if some function were available to simply say if atleast one line is available. For generating callbacks, that is all that is needed.

Source

chu11

Most helpful comment

For posterity, I'm getting the following numbers running an lptest job with 79 chars per line and a variable number of lines. The 1X column is a single lptest task. The 10X column is 10 lptest tasks, running on 10 separate shells, each producing the specified number of lines (so 10X the output).

| lines | 1X | 10X |
| ------- | ---- | --- |
| 1K | 0.5s | 1.5s |
| 10K | 2.5s | 12.9s |
| 100K | 20.3s | 118s |
| 1M | 204s | - |

This is with no flow control.

Much better!

garlick on 8 Aug 2019

👍2

All 13 comments

if some function were available to simply say if atleast one line is available

I think a flux_buffer_has_line() could be implemented something like:

bool flux_buffer_has_line (flux_buffer_t *fb)
{
    char buf[1];
    return (cbuf_peek_line (fb->cbuf, buf, 0, 1) > 0);
}

There is also some line counting going on down in libflux/buffer.c:check_read_cb():

It probably promotes "fairness" in the reactor if the buffer read callback is only called once per reactor loop, instead of calling it until the callback stops consuming lines. At least, leave it up to the callback to iteratively call read_line until all lines are consumed.

This would have the added benefit that the available lines comparison would no longer be needed and only flux_buffer_has_line() would be required...

E.g. instead of:

    else if (fb->cb_type == FLUX_BUFFER_CB_TYPE_READ_LINE
             && flux_buffer_lines (fb) > 0) {
        int count = flux_buffer_lines (fb);

        /* we will iterate over all lines, but only if the user is
         * reading them.  If the user isn't reading lines, we're not
         * going to infinitely loop
         */
        while (fb->cb_type == FLUX_BUFFER_CB_TYPE_READ_LINE
               && count > 0) {
            int tmp;

            fb->cb (fb, fb->cb_arg);

            if ((tmp = flux_buffer_lines (fb)) < 0)
                break;

            if (tmp < count)
                count = tmp;
            else
                break;
        }
    }

You would have

    else if (fb->cb_type == FLUX_BUFFER_CB_TYPE_READ_LINE
             && flux_buffer_has_line (fb) > 0)
        fb->cb (fb, fb->cb_arg);

Of course, this is a change to non-reactor base uses of flux_buffer_t, so those instances will need to be updated to move the read/read_line loop into the callback, instead of expecting the callback to called until there is no more data to read. (not sure if this is used outside of unit tests.)

grondo on 7 Aug 2019

👍1

As a test I implemented flux_buffer_has_line() along with the change to check_read_cb above and results look promising:

before:

$ time flux srun t/shell/lptest 79 10000 | wc -l
10000

real    1m26.405s
user    0m0.253s
sys 0m0.075s

after:

$ time flux srun t/shell/lptest 79 10000 | wc -l
10000

real    0m2.525s
user    0m0.279s
sys 0m0.055s

grondo on 7 Aug 2019

👍2

It probably promotes "fairness" in the reactor if the buffer read callback is only called once per reactor loop, instead of calling it until the callback stops consuming lines. At least, leave it up to the callback to iteratively call read_line until all lines are consumed.

Agreed. I began thinking about that when I noticed how @garlick's use of flux_subprocess_read_line() in the shell.

chu11 on 7 Aug 2019

Sounds promising!

On Tue, Aug 6, 2019, 8:43 PM Al Chu notifications@github.com wrote:

It probably promotes "fairness" in the reactor if the buffer read callback
is only called once per reactor loop, instead of calling it until the
callback stops consuming lines. At least, leave it up to the callback to
iteratively call read_line until all lines are consumed.

Agreed. I began thinking about that when I noticed how @garlick
https://github.com/garlick's use of flux_subprocess_read_line() in the
shell.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/flux-framework/flux-core/issues/2287?email_source=notifications&email_token=AABJPW6NL4O372CTENX3S6DQDJAHFA5CNFSM4IJ3KSX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3XDI3A#issuecomment-518927468,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AABJPW267AET5BGD33GMU4LQDJAHFANCNFSM4IJ3KSXQ
.

garlick on 7 Aug 2019

At least, leave it up to the callback to iteratively call read_line until all lines are consumed.

I think the fairness comment is spot on, however it should be noted that one cannot call read_line until all lines are consumed in the callback without risk of blocking (unless I am missing something).

garlick on 7 Aug 2019

Hm, maybe I'm not looking at the same code, but flux_buffer_read_line just calls cbuf_read_line which doesn't block afaik. (I'm not even sure how it would block?)

I'm probably just looking at the wrong spot.

BTW, I suppose it should be obvious but if any of these IO buffering calls ever block it would be a deadlock, so if there is a potential to block we should fix it I suppose.

grondo on 7 Aug 2019

@garlick i'm having trouble groking your last comment. "cannot call read_line until all lines are consumed"?

chu11 on 7 Aug 2019

@chu11, I could finish up my addition of flux_buffer_has_line() and propose a PR if you'd like (already part way done anyway)

grondo on 7 Aug 2019

"cannot call read_line until all lines are consumed"?

Sorry, that was unclear. I meant cannot loop in the callback, calling read_line with "all lines comsumed" as the loop break.

Sounds like I was mistaken anyway! Sorry for the noise.

garlick on 7 Aug 2019

@chu11, I could finish up my addition of flux_buffer_has_line() and propose a PR if you'd like (already part way done anyway)

Awesome if you're far along. I think the only major gotcha will be tests which may be assuming the user will get a callback for each line in the buffer.

chu11 on 7 Aug 2019

only major gotcha will be tests which may be assuming the user will get a callback for each line in the buffer.

Yeah, haven't decided what to do there yet...

grondo on 7 Aug 2019

Yeah, haven't decided what to do there yet...

@chu11, actually would it be ok to just remove the multiple_callback() test completely? I was going to change those callbacks to read from the buffer in a loop, but maybe if multiple callbacks aren't a feature of flux_buffer_t anymore we should just remove the tests?

grondo on 7 Aug 2019

| lines | 1X | 10X |
| ------- | ---- | --- |
| 1K | 0.5s | 1.5s |
| 10K | 2.5s | 12.9s |
| 100K | 20.3s | 118s |
| 1M | 204s | - |

This is with no flow control.

Much better!

garlick on 8 Aug 2019

👍2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

testsuite: failures when run with `debug=t` OR `verbose=t`

SteVwonder · 5Comments

increase minimum jansson version

chu11 · 3Comments

The mustache template doesn't work with flux mini run

dongahn · 7Comments

discuss placement of bank/accounting module

cmoussa1 · 8Comments

best practice for launching flux with preset FLUX_URI

grondo · 7Comments