As of now, I can utilize an RPC call (taken from src/cmd/flux-jobs.py) to get job information on all _inactive_ jobs in a Flux instance using the following:
rpc_handle = flux.job.job_list(h, args.count, attrs, userid, flags)
try:
jobs = flux.job.job_list_get(rpc_handle)
except EnvironmentError as e:
print("{}: {}".format("rpc", e.strerror), file=sys.stderr)
sys.exit(1)
I can then parse the list of jobs, extract the info I need, and then write it to a database.
However, as time goes on, and I call this again to get more recently transitioned _inactive_ jobs, I get the old ones as well (more specifically, the jobs I've already written to SQLite). It would be nice to have a way to get a subset of jobs that have recently completed and transitioned to _inactive_, returning a new set of jobs to extract info from and write to a database.
Once #2592 is completed, is getting job info for a specific job id not going to work?
Is the query you are looking for something like, "jobs that have transitioned to inactive in the last X minutes"?
Well, I don't think I've thought out that possibility all the way through yet, so I could be wrong, but my thinking is that getting "a batch of jobs that have transitioned to _inactive_ in the last _x_ minutes" might lead to better performance, since it would only need to establish a SQLite connection once to write a batch of jobs, instead of having to establish a new connection every time a job transitions to _inactive_ in order to push it to a database.
Maybe if the jobs that have transitioned to _inactive_ were then ordered by something like completion time or cleanup time, then I could pick up where I left off, starting at some point in a list of _inactive_ jobs after a certain time?
Maybe if the jobs that have transitioned to inactive were then ordered by something like completion time or cleanup time, then I could pick up where I left off, starting at some point in a list of inactive jobs after a certain time?
To summarize a coffee discussion, there are two queries that could be useful here:
job-info.Option 1 above is attractive because that particular query could be useful outside of the use case described here.
Option 2 could be useful in the solution for #2601.
Right now inactive jobs are pre-pended to the internal inactive list b/c most recent completed job was desired to be listed first.
So the first query option should be easy to do. The latter, doable but more annoying (off the top of my head, have to check if jobid is legal, return list in opposite order, maybe there are some other corner cases).
Oh yeah, sorry. I guess technically I meant either strictly appended or prepended. The querying tool probably doesn't care that much about order, but most recently completed first makes sense (obviously, sorry wasn't thinking)
From our coffee time discussion:
It would be useful to have a query to get a chunk of jobs that have completed since time _T_, and then storing the info of the last job _J_ that was read. Since jobs are prepended by completion time, we could then pick up where we left off and query another group of jobs that have completed since that _J's_ completion time, and so on. This would help negate the bottleneck I was running into by getting all jobs that have transitioned to _inactive_, which produces duplicate data after every call.
Most helpful comment
To summarize a coffee discussion, there are two queries that could be useful here:
job-info.Option 1 above is attractive because that particular query could be useful outside of the use case described here.
Option 2 could be useful in the solution for #2601.