Flux-core: What is the correct error handling for `flux_job_submit_get_id`?

Created on 14 May 2019 · 18Comments · Source: flux-framework/flux-core

Currently, if you pass a NULL future or NULL pointer for jobid to flux_job_submit_get_id, it sets errno = EINVAL and returns -1. If you hit that error condition while using the following common error handling (as seen in the flux-job cmd):

    if (flux_job_submit_get_id (f, &id) < 0) {
        log_msg_exit ("%s", flux_future_error_string (f));
    }

You are likely to get: future not fullfilled, which, unfortunately, make it seem like the error was that the future wasn't fulfilled yet when in reality the problem was with the arguments passed in. This also causes problems with the python bindings, which uses the same error handling as above (if future methods return -1, then automatically call flux_future_error_string).

Given that, what is the proper way to check errors for this function (and other functions like it that are essentially flux_rpc_get with some extra "secret sauce" logic)?

Are there potentially changes to the function that would make error handling easier? Maybe have multiple negative return code? -1 implies check flux_future_error_string while -2 implies just check errno? Maybe fulfill the future with an error (assuming a valid, non-null future was passed in)?

question

Source

SteVwonder

Most helpful comment

After talking about this at coffee time, I think the goal is to:

A) flux_future_error_string() should not return a non-NULL when an error does not exist in the future.
B) add a flux_future_has_error() (function name could be changed)
C) add a macro of some type that will wrap

flux_future_has_error (f) ?  flux_future_error_string (f) :  flux_strerror (errno)

macro name TBD.

D) as an aside, fix up flux_job_submit_get_id() to not return an error when the in & out parameter is NULL.

chu11 on 16 May 2019

👍3

All 18 comments

Yeah, this seems like a tricky one.

I would hate to have multiple return codes from these functions, though that may be clearest. Another approach might be to return flux_strerror (errno) instead of the literal "future not fulfilled" from flux_future_error_string(3) when errno is set and the future doesn't have a valid fatal or result errnum? (Haven't thought that through so it might be just as confusing)

grondo on 14 May 2019

Another thought: we could add a function to test for an error condition on a future:

bool flux_future_has_error (flux_future_t *f);
/* or: */
int flux_future_get_errnum (flux_future_t *f);

Then the common error handling code could be updated to:

 if (flux_job_submit_get_id (f, &id) < 0) {
        log_msg_exit ("%s",
            flux_future_has_error (f) ?
            flux_future_error_string (f) :
            flux_strerror (errno));
 }

we could provide a macro or inline function to handle this common case.
(or again, we could just include this functionality in flux_future_error_string(3)

grondo on 14 May 2019

👍1

If you pass a NULL future, you should get "future NULL" from flux_future_error_string().

const char *flux_future_error_string (flux_future_t *f)
{
    if (f) {
        /* snip */
    }
    return "future NULL";
}

So I think your situation is specific to just passing in a NULL pointer for the jobid.

Perhaps flux_job_submit_get_id() should not check if the jobid pointer is NULL and return EINVAL? In many other future "get" functions, we do something like:

   if (jobidp)
      *jobidp = jobid;

or something similar. If the user passed in NULL pointer, the function succeeds and they just get nothing back.

chu11 on 14 May 2019

👍1

Another thought: we could add a function to test for an error condition on a future:

What flux_future_error_string() should return was discussed in #2075 and changed in #2091. Should we revisit? As discussed in #2075, the hope was to replace:

flux_future_t *f = flux_rpc (...)
if (flux_rpc_get (f, NULL) < 0) {
    char *errstr = flux_future_error_string (f);
    if (errstr)
        log_msg ("thing: %s", errstr);
    else
        log_err ("thing");
}

with

flux_future_t *f = flux_rpc (...)
if (flux_rpc_get (f, NULL) < 0)
    log_msg ("thing: %s", flux_future_error_string (f));

Is this a corner case where the logic doesn't hold? Particularly with wrapper functions like:

if (flux_kvs_lookup_get_symlink (f, &target) < 0)
    log_msg ("thing: %s", flux_future_error_string (f));

Internally there is KVS treeobj parsing. So how do we know the error exists in the future and not in the treeobj parsing?

Another approach might be to return flux_strerror (errno) instead of the literal "future not fulfilled" from flux_future_error_string(3) when errno is set and the future doesn't have a valid fatal or result errnum?

I think this could work. There's a part of me that doesn't like this, b/c who knows if errno was ever set correctly. But if we go under the assumption that flux_future_error_string() is only called when an error has definitely happened, it's probably a safe bet.

chu11 on 14 May 2019

Internally there is KVS treeobj parsing. So how do we know the error exists in the future and not in the treeobj parsing?

Yes, I think this is the larger issue @SteVwonder was alluding to in his original question. On a negative return from flux_future_get(3) and derivative functions, the caller doesn't know if the error was from the future or the function. Therefore, we need to offer a way to explicitly check for an error condition on f, as well as a convenience function that sensibly handles the current case.

grondo on 14 May 2019

👍1

Yes, I think this is the larger issue @SteVwonder was alluding to in his original question. On a negative return from flux_future_get(3) and derivative functions, the caller doesn't know if the error was from the future or the function. Therefore, we need to offer a way to explicitly check for an error condition on f, as well as a convenience function that sensibly handles the current case.

I just remembered, all the way back in #1589 I thought of this exact situation.

An optional error string can be attached to any error set via flux_future_fatal_error() or flux_future_fulfill_error(). The reason is there is no way to determine where a flux future error came from. With the example given above, when callling flux_kvs_lookup_get(), users will have no idea if a ENOMEM came from the client side, server side, or any other wrapper code in libkvs. For any code that uses futures, this error string can provide additional information. This would require a new function, for the time being we'll call it flux_future_get_error_string().

For wrapper functions to flux_future_get() (e.g like flux_kvs_lookup_get()), perhaps its the responsibility of the function writer to call flux_future_fatal_error() or similar for "internal" errors.

chu11 on 14 May 2019

For wrapper functions to flux_future_get() (e.g like flux_kvs_lookup_get()), perhaps its the responsibility of the function writer to call flux_future_fatal_error() or similar for "internal" errors.

Good point, but that approach would make the affected future unusable. In some cases this is probably an unwanted side effect (e.g. in this case with invalid parameters to the "outer" get() function)

grondo on 14 May 2019

Now that I think about it, when I first added flux_future_error_string() the intent was for it to be a "helper" of sorts for sending error information back. The errno/errnum was the "true" error.

Perhaps that's the key issue. What was once a "helper" is now the "use this to get the true error string", and that's inherently the problem.

Looking at the snippet above you posted:

 if (flux_job_submit_get_id (f, &id) < 0) {
        log_msg_exit ("%s",
            flux_future_has_error (f) ?
            flux_future_error_string (f) :
            flux_strerror (errno));
 }

it's not so different than the snippet we were trying to eleminate in issue #2075

flux_future_t *f = flux_rpc (...)
if (flux_rpc_get (f, NULL) < 0) {
    char *errstr = flux_future_error_string (f);
    if (errstr)
        log_msg ("thing: %s", errstr);
    else
        log_err ("thing");
}

Did we just make a mistake with this change in #2075?

chu11 on 14 May 2019

Did we just make a mistake with this change in #2075?

Yeah, I think you're right because we ended up obscuring information with that change.

However, my personal preference would be to add a function that explicitly checks a future for an error condition, rather than assuming it from non-NULL flux_future_error_string().

grondo on 14 May 2019

Would it be OK to have an "outer get" function just call flux_future_fatal_error()? The errors mentioned are programming errors like bad args, or maybe an unlikely response decode error. In both cases, it doesn't seem like recovery of the future is feasible.

Then flux_future_error_string() would work even if the future wasn't fulfilled with an error per se. Except in the NULL future case, but maybe then flux_strerror(errno) could be returned like @grondo suggested earlier?

garlick on 14 May 2019

Would it be OK to have an "outer get" function just call
flux_future_fatal_error()? The errors mentioned are programming errors
like bad args, or maybe an unlikely response decode error. In both cases,
it doesn't seem like recovery of the future is feasible.

This is what @chu11 proposed above, but I'm not sure about the mechanics of
"fulfilling" a future with a fatal error due to a bad parameter of the
get function. Not only does that seem to violate the idea of a future
result, but also obscures errors even further, IMO.

Then flux_future_error_string() would work even if the future wasn't
fulfilled with an error per se. Except in the NULL future case, but maybe
then flux_strerror(errno) could be returned like @grondo
https://github.com/grondo suggested earlier?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/flux-framework/flux-core/issues/2160?email_source=notifications&email_token=AAFVEUXOTV5UX77ZECZTXCLPVI42RA5CNFSM4HMT3XLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVKG7YY#issuecomment-492072931,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAFVEUVBFT7ZP32MXMSMYJDPVI42RANCNFSM4HMT3XLA
.

grondo on 14 May 2019

However, my personal preference would be to add a function that explicitly checks a future for an error condition, rather than assuming it from non-NULL flux_future_error_string().

So it would return the optional error string set in a future OR flux_strerror (future.errnum)? But NULL otherwise?

Perhaps the mistake was we shouldn't have tried to put everything into one function. There should be one to get the error string a user may have set in the flux future (the original flux_future_error_string()) and one to be the super convenience wrapper for returning an error string (something like the current flux_future_error_string() + flux_strerror (errno)).

chu11 on 14 May 2019

I kind of like this then:

Another approach might be to return flux_strerror (errno) instead of the literal "future not fulfilled" from flux_future_error_string(3) when errno is set and the future doesn't have a valid fatal or result errnum?

If we think in some cases it might be useful to have a flux_future_has_error(), that seems fine, although I would think in the majority of cases, we'll want to just report the proximate error and propagate it up, or fail out, regardless of whether it's bad args on the get, an internal future error, an error fulfillment, or whatnot.

garlick on 14 May 2019

I would think in the majority of cases, we'll want to just report the proximate error and propagate it up, or fail out, regardless of whether it's bad args on the get, an internal future error, an error fulfillment, or whatnot.

Yeah, it seems like we're all in agreement on this point. However, I guess how that behavior is implemented is in question. Do we want a future to be fulfilled only be a future result, error, or inability to process the result (fatal error), or should a future also be a container for generic errors from accessor functions.

I'd slightly prefer that the generic errors be kept separate as they are now, and users be given a way to test for an error condition on the flux_future_t (flux_future_has_error() or similar), and this function be used in a macro or inline function to implement the behavior above (so it is self-documenting).

If we go the other way and require that get() functions set a fatal error on flux_future_t for internal errors, then we perhaps should document that futures contain a future result and/or error result of intermediate functions (and perhaps still give a user a way to determine if the EINVAL came from the remote service, or a bad argument to a get() function?)

grondo on 14 May 2019

I'd slightly prefer that the generic errors be kept separate as they are now, and users be given a way to test for an error condition on the flux_future_t (flux_future_has_error() or similar), and this function be used in a macro or inline function to implement the behavior above (so it is self-documenting).

Yeah, in my comment above, I should have said I agree with your point about keeping the getter errors out of the future. It violates principle of lest surprise, if nothing else.

Do (code) macros or inlines present issues for bindings?

garlick on 14 May 2019

Do (code) macros or inlines present issues for bindings?

Oh, good question.

grondo on 14 May 2019

Do (code) macros or inlines present issues for bindings?

Yeah. I believe they do.

If the macro/inline is simple enough, we can always replicate the logic on the python-side. Although that has it's own downsides. We already kind of do this with check_future_error in util.py. It is a decorator that attempts to call a future function, and if it fails with an exception, call flux_future_error_string. If that fails, then re-raise the original EnvironmentError exception (which will have an errno and associated strerror in it). Here it is in use: [1] [2] [3].

SteVwonder on 14 May 2019

After talking about this at coffee time, I think the goal is to:

flux_future_has_error (f) ?  flux_future_error_string (f) :  flux_strerror (errno)

macro name TBD.

D) as an aside, fix up flux_job_submit_get_id() to not return an error when the in & out parameter is NULL.

chu11 on 16 May 2019

👍3

Was this page helpful?

0 / 5 - 0 ratings

Related issues

details of Flux test system

garlick · 8Comments

[spectrum mpi] need to supppress OMPI_, JSM_, and PMIX_ environment

SteVwonder · 7Comments

wreck parity: need support for job names

SteVwonder · 7Comments

libflux: change flux_future_error_string() to return flux_strerror() if textual error was not set

chu11 · 6Comments

Auto-generating keys for first-time users?

SteVwonder · 4Comments