Flux-core: flux-jobs: get subset of recently completed jobs

Created on 28 Jan 2020 · 6Comments · Source: flux-framework/flux-core

As of now, I can utilize an RPC call (taken from src/cmd/flux-jobs.py) to get job information on all _inactive_ jobs in a Flux instance using the following:

    rpc_handle = flux.job.job_list(h, args.count, attrs, userid, flags)
    try:
        jobs = flux.job.job_list_get(rpc_handle)
    except EnvironmentError as e:
        print("{}: {}".format("rpc", e.strerror), file=sys.stderr)
        sys.exit(1)

I can then parse the list of jobs, extract the info I need, and then write it to a database.

However, as time goes on, and I call this again to get more recently transitioned _inactive_ jobs, I get the old ones as well (more specifically, the jobs I've already written to SQLite). It would be nice to have a way to get a subset of jobs that have recently completed and transitioned to _inactive_, returning a new set of jobs to extract info from and write to a database.

Source

cmoussa1

Most helpful comment

Maybe if the jobs that have transitioned to inactive were then ordered by something like completion time or cleanup time, then I could pick up where I left off, starting at some point in a list of inactive jobs after a certain time?

To summarize a coffee discussion, there are two queries that could be useful here:

Return a list of inactive jobs that completed since time T. A tool that is processing inactive jobs to push information to a database could then note the completion time of the last job in a batch and restart from there the next time it runs.
Return of list of inactive jobs that completed after a given jobid. This would only work if inactive jobs are only appended to the inactive job list in job-info.

Option 1 above is attractive because that particular query could be useful outside of the use case described here.

Option 2 could be useful in the solution for #2601.

grondo on 28 Jan 2020

👍2

All 6 comments

Once #2592 is completed, is getting job info for a specific job id not going to work?

Is the query you are looking for something like, "jobs that have transitioned to inactive in the last X minutes"?

chu11 on 28 Jan 2020

Well, I don't think I've thought out that possibility all the way through yet, so I could be wrong, but my thinking is that getting "a batch of jobs that have transitioned to _inactive_ in the last _x_ minutes" might lead to better performance, since it would only need to establish a SQLite connection once to write a batch of jobs, instead of having to establish a new connection every time a job transitions to _inactive_ in order to push it to a database.

Maybe if the jobs that have transitioned to _inactive_ were then ordered by something like completion time or cleanup time, then I could pick up where I left off, starting at some point in a list of _inactive_ jobs after a certain time?

cmoussa1 on 28 Jan 2020

Maybe if the jobs that have transitioned to inactive were then ordered by something like completion time or cleanup time, then I could pick up where I left off, starting at some point in a list of inactive jobs after a certain time?

To summarize a coffee discussion, there are two queries that could be useful here:

Return a list of inactive jobs that completed since time T. A tool that is processing inactive jobs to push information to a database could then note the completion time of the last job in a batch and restart from there the next time it runs.
Return of list of inactive jobs that completed after a given jobid. This would only work if inactive jobs are only appended to the inactive job list in job-info.

Option 1 above is attractive because that particular query could be useful outside of the use case described here.

Option 2 could be useful in the solution for #2601.

grondo on 28 Jan 2020

👍2

Right now inactive jobs are pre-pended to the internal inactive list b/c most recent completed job was desired to be listed first.

So the first query option should be easy to do. The latter, doable but more annoying (off the top of my head, have to check if jobid is legal, return list in opposite order, maybe there are some other corner cases).

chu11 on 28 Jan 2020

👍1

Oh yeah, sorry. I guess technically I meant either strictly appended or prepended. The querying tool probably doesn't care that much about order, but most recently completed first makes sense (obviously, sorry wasn't thinking)

grondo on 28 Jan 2020

From our coffee time discussion:

It would be useful to have a query to get a chunk of jobs that have completed since time _T_, and then storing the info of the last job _J_ that was read. Since jobs are prepended by completion time, we could then pick up where we left off and query another group of jobs that have completed since that _J's_ completion time, and so on. This would help negate the bottleneck I was running into by getting all jobs that have transitioned to _inactive_, which produces duplicate data after every call.

cmoussa1 on 30 Jan 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

increase minimum jansson version

chu11 · 3Comments

[spectrum mpi] need to supppress OMPI_, JSM_, and PMIX_ environment

SteVwonder · 7Comments

testsuite: failures when run with `debug=t` OR `verbose=t`

SteVwonder · 5Comments

libflux: change flux_future_error_string() to return flux_strerror() if textual error was not set

chu11 · 6Comments

job-manager: allow specific job id's to be listed

garlick · 8Comments