k6 🚀 - Context cancelled message shown when iteration context done

@na-- I think it's not the interrupted but it occurs frequently in case of request context timeout or cancelled, I'm investigating the root cause.

cuonglm on 18 Dec 2019

I spent some time on Friday digging through and thinking about the issues mentioned here and that https://github.com/loadimpact/k6/pull/1284 tries to solve, and I think we're going to need some substantial refactoring...

I think this is the wrong way to handle VU interruption and contexts with the new executors:
https://github.com/loadimpact/k6/blob/f6be35cfa006c56954d6821e0f94f4863f063e45/js/runner.go#L371-L381
https://github.com/loadimpact/k6/blob/f6be35cfa006c56954d6821e0f94f4863f063e45/js/runner.go#L395-L413
https://github.com/loadimpact/k6/blob/f6be35cfa006c56954d6821e0f94f4863f063e45/js/runner.go#L328-L337

The complicated handling of contexts is unnecessary... Each of the new executors handles contexts and VUs very deliberately and precisely, so we can just separate the context handling (i.e. spawning a goroutine to cancel/interrupt the goja runtime) and the actual running of the users' code. I think we should have separate concepts of an initialized VU (no context, can't do work, sitting in the VU buffer pool) that can transform itself (by giving it a context + some other options ** ) to an active VU (which has a RunOnce() method)

I propose that we need to change the lib.VU and lib.Runner interfaces in runner.go, and their implementations of course, in the following way:

Remove the VU.Reconfigure() method altogether, the Runner.NewVU() method should accept an ID as well - with the new executors, the VU IDs don't change in the middle of the script execution (I actually have to add that to the minor changes in #1007, since I see that it's missing from there).
Add a new interface called something like InitializedVU - this is what Runner.NewVU() would return, and this is what we'd store in the ExecutionState.vus buffer. This new InitializedVU type wouldn't have a Run() method at all, instead it would have something like an Activate(<args>) ActiveVU method that would itself return a Run()-able "active" VU.
This activate method would accept <args> that include a context, other activation options ( ** basically the ones we want to include in the new executors like env vars, non-default exec function, tags, etc.), as well as something like a DeactivationCallback that gets called when the VU should be returned to the buffer pool.
The Activate() method would be responsible for spawning a goroutine that tracks when the supplied context expires, so it will interrupt any script execution. It will also make sure that any actual goja execution is finished before it calls the deactivate callback and returns the VU to the buffer. Basically, it will replace https://github.com/loadimpact/k6/blob/f6be35cfa006c56954d6821e0f94f4863f063e45/js/runner.go#L395-L427 (both the mutex and the context shenanigans) and will be just as efficient (if not more so), since we'd only spawn that goroutine once per executor, like the current code does, but in a much saner way.

Basically, I propose that the interfaces should look somewhat like this:

// A Runner is a factory for VUs. It should precompute as much as possible upon
// creation (parse ASTs, load files into memory, etc.), so that spawning VUs
// becomes as fast as possible. The Runner doesn't actually *do* anything in
// itself, the ExecutionScheduler is responsible for wrapping and scheduling
// these VUs for execution.
//
// TODO: Rename this to something more obvious? This name made sense a very long
// time ago.
type Runner interface {
    // Creates an Archive of the runner. There should be a corresponding NewFromArchive() function
    // that will restore the runner from the archive.
    MakeArchive() *Archive

    // Spawns a new VU. It's fine to make this function rather heavy, if it means a performance
    // improvement at runtime. Remember, this is called once per VU and normally only at the start
    // of a test - RunOnce() may be called hundreds of thousands of times, and must be fast.
    NewVU(ctx context.Context, out chan<- stats.SampleContainer, id int64) (InitializedVU, error)

    // Runs pre-test setup, if applicable.
    Setup(ctx context.Context, out chan<- stats.SampleContainer) error

    // Returns json representation of the setup data if setup() is specified and run, nil otherwise
    GetSetupData() []byte

    // Saves the externally supplied setup data as json in the runner
    SetSetupData([]byte)

    // Runs post-test teardown, if applicable.
    Teardown(ctx context.Context, out chan<- stats.SampleContainer) error

    // Returns the default (root) Group.
    GetDefaultGroup() *Group

    // Get and set options. The initial value will be whatever the script specifies (for JS,
    // `export let options = {}`); cmd/run.go will mix this in with CLI-, config- and env-provided
    // values and write it back to the runner.
    GetOptions() Options
    SetOptions(opts Options) error
}

// VUActivationParams are supplied by each executor when it retrieves a VU from
// the buffer pool and activates it for use.
type VUActivationParams struct {
    Ctx                context.Context
    Env                map[string]string
    Tags               map[string]string
    Exec               null.String
    DeactivateCallback func(InitializedVU)
}

// InitializedVUs are virtual users ready for work. They need to be activated
// (i.e. given a context) before they can actually be used. Activation also
// requires a callback function, which will be called when the supplied context
// is done. That way, VUs can be returned back to a pool and reused.
type InitializedVU interface {
    // Fully activate the VU so it will be able to run code
    Activate(*VUActivationParams) ActiveVU
}

type ActiveVU interface {
    // Runs the VU once. The only way to interrupt the execution is to cancel
    // the context given to InitializedVU.Activate()
    RunOnce() error
}

na-- on 23 Dec 2019

👍1

I'm wondering if it isn't a good idea to also tackle https://github.com/loadimpact/k6/issues/1250 during the same refactoring (that is, if we end up doing something like what I suggested above), or if it's just my tendency to bundle semi-related things speaking? :sweat_smile:

na-- on 23 Dec 2019

👀1

@na-- Your NewVu requires a context:

NewVU(ctx context.Context, out chan<- stats.SampleContainer, id int64) (InitializedVU, error)

Where will we store that context? (new field in VU struct?)

cuonglm on 24 Dec 2019

I tried implementing this concept today, and there're many confusions:

DeactivateCallback func(InitializedVU) get an InitializedVU, but InitializedVU only have the Activate method, How VUs can be returned back to a pool and reused?
Many tests broken due to context error, most visible is test isolation setup data, the error:

 ERRO[0000] Error: default: wrong data for iter 0: {"v":1}
    at file:///script.js:23:9(19)  executor=shared_iters

By changing NewVU, every functions which are using Runner.newVU becomes broken. If I changed it to match NewVU signature, then it makes no sense, it does not need context or VU id.

For clarification, what I have done:

Adding InitializedVU and ActiveVU interfaces.
Changing lib.VU -> lib.InitializedVU
Make js.VU struct implements 2 those interfaces.
Change every vu.RunOnce -> vu.Activate(&lib. VUActivationParams{Ctx: ctx}).RunOnce()
Run all tests inside js/ directory.

The code does compile, but seems that all the tests were broken. (The output error just ate all of my screen).

cuonglm on 24 Dec 2019

@na-- Your NewVu requires a context:
NewVU(ctx context.Context, out chan<- stats.SampleContainer, id int64) (InitializedVU, error)
Where will we store that context? (new field in VU struct?)

No, we don't want to store this context. As you can see from this code, the idea was that VU initialization would also accept a context, since it actually takes some time and we might want to interrupt it: https://github.com/loadimpact/k6/blob/f6be35cfa006c56954d6821e0f94f4863f063e45/core/local/local.go#L134-L138

But the JS runner doesn't implement that yet... And while I'd prefer that we fix it, it's not strictly necessary to do so _now_, so I'd also be mostly OK if we remove the context from NewVU()... Fixing it may end up being an even bigger refactoring than the one I proposed above, and it's not something urgent because right now cancelling VU initialization (i.e Ctrl+C before the script starts) works somewhat OK though this mechanism: https://github.com/loadimpact/k6/blob/f6be35cfa006c56954d6821e0f94f4863f063e45/core/local/local.go#L209-L210

I tried implementing this concept today, and there're many confusions:

* `DeactivateCallback func(InitializedVU)` get an `InitializedVU`, but `InitializedVU` only have the `Activate` method, How `VUs can be returned back to a pool and reused`?

Not sure I fully understand your question, but the whole idea is that whatever code (in our case, some executor) is calling InitializedVU.Activate() would have previously had to retrieve that VU from the buffer pool. So it should also know how to put it back... Basically, VUs only "know" that they can either be inactive (simply initialized) or active. They don't "know" that there's a pool of them. Executors "know" that the buffer pool of VUs exists and how to take VUs from it and how to put them back (this is the case even now, before my proposed refactoring). So when an executor calls the Activate() method of some VU, they can supply a lambda that puts the VU back in the pool...

Many tests broken due to context error, most visible is test isolation setup data, the error:

None of the tests should be broken functionally, since we're not changing any functionality with this refactoring, though of course some tests would require substantial edits because of the new VU APIs... Not sure about the particular example you pasted, can't comment without seeing the code.

By changing NewVU, every functions which are using Runner.newVU becomes broken. If I changed it to match NewVU signature, then it makes no sense, it does not need context or VU id.

:confused: :confused:
Does this part of newVU() seem OK to you, or in some way unconnected to the ideas from my refactoring suggestion :wink: https://github.com/loadimpact/k6/blob/f6be35cfa006c56954d6821e0f94f4863f063e45/js/runner.go#L206-L209

For clarification, what I have done:
* Adding `InitializedVU` and `ActiveVU` interfaces.

* Changing `lib.VU` -> `lib.InitializedVU`

* Make `js.VU` struct implements 2 those interfaces.

* Change every `vu.RunOnce` -> `vu.Activate(&lib. VUActivationParams{Ctx: ctx}).RunOnce()`

* Run all tests inside `js/` directory.
The code does compile, but seems that all the tests were broken. (The output error just ate all of my screen).

Not sure you have fully understood my suggested changes :confused: If you have any questions, please ask them instead of just directly changing the code without understanding the ideas behind the changes... This change specifically doesn't make sense to me: "Change every vu.RunOnce -> vu.Activate(&lib. VUActivationParams{Ctx: ctx}).RunOnce()".

We're not just activating the VU before every RunOnce() call :confounded: ... Instead each executor would get an initialized VU from the ExecutionState.vus buffer pool, Activate() it one time (with the maxDurationCtx), and then start calling its RunOnce() method in the particular manner it does (every executor calls it differently after all). Then, when the maxDurationCtx expires (i.e. when the executor is done), the VU would call the supplied DeactivateCallback and that would return it back to the buffer, so another executor can use it again.

na-- on 27 Dec 2019

@na--

RunOnce now does not get a context, which lead to the same situation for runFn https://github.com/loadimpact/k6/blob/f6be35cfa006c56954d6821e0f94f4863f063e45/js/runner.go#L430

How can we know if the iteration is done? https://github.com/loadimpact/k6/blob/f6be35cfa006c56954d6821e0f94f4863f063e45/js/runner.go#L483

So when an executor calls the Activate() method of some VU, they can supply a lambda that puts the VU back in the pool...

I understand, but you pass the InitializedVU interface to that lambda, then what can you do inside the lambda? Since when Activate is the only method you can call on an InitializedVU.

What I think, DeactivateCallback func(InitializedVU) should be DeactivateCallback func(ActivateVU) and the ActivateVU interface should have a method Deactivate() InitializedVU.

Does this part of newVU() seem OK to you, or in some way unconnected to the ideas from my refactoring suggestion 😉

Can you elaborate? The vu.Reconfigure is already removed in your refactoring suggestion.

Not sure you have fully understood my suggested changes 😕 If you have any questions, please ask them instead of just directly changing the code without understanding the ideas behind the changes... This change specifically doesn't make sense to me: "Change every vu.RunOnce -> vu.Activate(&lib. VUActivationParams{Ctx: ctx}).RunOnce()".

We're not just activating the VU before every RunOnce() call 😖 ... Instead each executor would get an initialized VU from the ExecutionState.vus buffer pool, Activate() it one time (with the maxDurationCtx), and then start calling its RunOnce() method in the particular manner it does (every executor calls it differently after all). Then, when the maxDurationCtx expires (i.e. when the executor is done), the VU would call the supplied DeactivateCallback and that would return it back to the buffer, so another executor can use it again.

IIRC, the only place need to change is https://github.com/loadimpact/k6/blob/f6be35cfa006c56954d6821e0f94f4863f063e45/lib/executor/helpers.go#L82

But I mean many tests create a VU by calling newVU or NewVU directly then call RunOnce(ctx) in current code base.

cuonglm on 27 Dec 2019

@na-- Please take a look at https://github.com/loadimpact/k6/tree/cuonglm/refactoring-runner to see how I try to apply your suggestion.

Just go to js/ directory and run go test, the test will fail.

cuonglm on 4 Jan 2020

@na-- Please take a look at https://github.com/loadimpact/k6/tree/cuonglm/refactoring-runner to see how I try to apply your suggestion.

A had a quick look at the changes in that branch and have a few notes:

You cannot just mechanically replace VU with InitializedVU everywhere. Just skimming through the diff, this mechanical search-and-replace doesn't make a lot of sense in a lot of places, especially some comments or debug statements ( logger.Debugf("Initialized InitializedVU #%d", vuID) :man_facepalming: )...
I'm not sure your implementation of Runner.NewVU(id int64, ...) is correct - you can see in the old Reconfigure() method that we also set the VU ID inside of the JS runtime (u.Runtime.Set("__VU", u.ID)), which you don't do in your code.
Runner.Activate() is not just the new name of Reconfigure()... params is just not an ornament, it has to be saved since the the Ctx and DeactivateCallback in it are vitally important... As I explained above, Activate() actually has to spawn a new goroutine that tracks that context and interrupts the goja JS VM, waits for it to finish any execution (if necessary) and then calls the deactivation callback...

Just go to js/ directory and run go test, the test will fail.

Well, you haven't implemented the actual changes I proposed, you've just renamed a bunch of things and made a few small cosmetic alterations. if the tests had passed, then this would have meant that our tests are very bad... :smile: I'm not sure you understand the ideas and the reasons behind my proposed changes. If so, please ask questions directly about them, since I'm not sure the code you produced is helpful in this discussion...

RunOnce now does not get a context, which lead to the same situation for runFn

https://github.com/loadimpact/k6/blob/f6be35cfa006c56954d6821e0f94f4863f063e45/js/runner.go#L430

How can we know if the iteration is done?

https://github.com/loadimpact/k6/blob/f6be35cfa006c56954d6821e0f94f4863f063e45/js/runner.go#L483

The iteration is done when runFn() returns. Whether the iteration was fully executed or was interrupted is another matter :wink:. Since only ActiveVUs can run iterations, and every ActiveVU has a context, we can simply check again if their context is done. My idea with the refactoring is that since an ActiveVU would only ever have a single context in its entire life (compared to the current complicated complex VU context management), this check should be pretty clean and non-racy.

So when an executor calls the Activate() method of some VU, they can supply a lambda that puts the VU back in the pool...

I understand, but you pass the InitializedVU interface to that lambda, then what can you do inside the lambda? Since when Activate is the only method you can call on an InitializedVU.

What I think, DeactivateCallback func(InitializedVU) should be DeactivateCallback func(ActivateVU) and the ActivateVU interface should have a method Deactivate() InitializedVU.

I don't see how this is an issue, but I think there was some miscommunication here, I'll try to explain how I envision the workflow again. Basically, each Executor, in its Run() method should:

Get its needed workers (i.e. initialized VUs) from the ExecutionState buffer pool of VUs.
Construct a very simple DeactivateCallback lambda _for each individual_ InitializedVU that just returns that very same object back to the buffer pool.
Activate()s each InitializedVU by passing the maxDurationCtx and its DeactivateCallback

The second step above is why we don't need to worry about the things you mentioned - each Executor would have the actual InitializedVU object and it would know where the VU buffer is. So now that I think about it, I'm not sure the DeactivateCallback lambda should have any parameters, i.e. instead of func(InitializedVU) it can probably just be func()... Though I'm not sure if we need any error handling here or not, probably not...

The only issue should be the fact that the goroutine spawned by Runner.Activate() that is responsible for Interrupt()-ing the JS VM when this context is done should wait for any actual JS execution to be done before it returns the VU back to the pool through DeactivateCallback(). This would require some complexity, since a VU might be in the middle of JS execution at that point, but it might just be sitting there quietly, doing nothing (especially relevant for the arrival-rate executors). And if the VU is executing code after we Interrupt() it, we should wait for it to finish (i.e. for RunOnce() to fully return) before we deactivate the VU and return it to the pool. I'm not sure that a simple sync.WaitGroup would be enough for this...

>

Does this part of newVU() seem OK to you, or in some way unconnected to the ideas from my refactoring suggestion wink

Can you elaborate? The vu.Reconfigure is already removed in your refactoring suggestion.

I just meant that you obviously had to pass the id int64 parameter from NewVU() to newVU()...

IIRC, the only place need to change is

https://github.com/loadimpact/k6/blob/f6be35cfa006c56954d6821e0f94f4863f063e45/lib/executor/helpers.go#L82

But I mean many tests create a VU by calling newVU or NewVU directly then call RunOnce(ctx) in current code base.

Far from it, as I explaining again above, we'd also have to change the executors' Run() methods as well (though the changes should be pretty similar for all executors and would probably go in a helper method or something like that. And unfortunately these proposed changes would also require changing, since we're changing the VU interface after all...

na-- on 6 Jan 2020

To give a more concrete example of how I envision the activation/deactivation of the VUs in the executors, the following code: https://github.com/loadimpact/k6/blob/f6be35cfa006c56954d6821e0f94f4863f063e45/lib/executor/per_vu_iterations.go#L181-L205

Should look somewhat like this when my proposed changes are implemented:


//TODO: this might be better-off as a method, instead of as a lambda
handleVU := func(initializedVU lib.InitializedVU) {
    // This is probably not strictly needed, but with it, we can return some
    // VUs to the buffer sooner
    ctx, cancel := context.WithCancel(maxDurationCtx)
    defer cancel()

    // TODO: some of this should probably be in a helper function, since
    // it's going to be repeated in all executors...
    vu := initializedVU.Activate(&lib.VUActivationParams{
        Ctx: ctx,
        DeactivateCallback: func() {
            pvi.executionState.ModCurrentlyActiveVUsCount(-1)
            pvi.executionState.ReturnVU(initializedVU)
        },
    })
    pvi.executionState.ModCurrentlyActiveVUsCount(+1) 
    activeVUs.Add(1)
    defer activeVUs.Done()

    for i := int64(0); i < iterations; i++ {
        select {
        case <-regDurationDone:
            return // don't make more iterations
        default:
            // continue looping
        }
        runIteration(maxDurationCtx, vu)
        atomic.AddUint64(doneIters, 1)
    }
}

for i := int64(0); i < numVUs; i++ {
    initializedVU, err := pvi.executionState.GetPlannedVU(pvi.logger)
    if err != nil {
        cancel()
        return err
    }
    go handleVU(initializedVU)
}

na-- on 6 Jan 2020

@na-- Thanks, but I see a conflict here, in:

    vu := initializedVU.Activate(&lib.VUActivationParams{
        Ctx: ctx,
        DeactivateCallback: func() {
            pvi.executionState.ModCurrentlyActiveVUsCount(-1)
            pvi.executionState.ReturnVU(initializedVU)
        },
    })

vu is lib.ActiveVU, some lines above, you did call:

runIteration(maxDurationCtx, vu)

This runIteration expected lib.InitializedVU instead.

cuonglm on 6 Jan 2020

:confused: ...I'm still getting the feeling that there's some serious misunderstanding going on :confounded:

This runIteration expected lib.InitializedVU instead.

runIteration() obviously has to work with ActiveVU, since they are the ones that can... run... iterations... :wink: I mean, that's literally their sole purpose in "life":

type ActiveVU interface {
    RunOnce() error
}

Please explain what conflict you see a bit more clearly...

na-- on 6 Jan 2020

hmm I just realized that this refactoring would actually sort-of fix another minor k6 issue - https://github.com/loadimpact/k6/issues/889

na-- on 6 Jan 2020

Relevant goja issue: https://github.com/dop251/goja/issues/120

edit: now closed by https://github.com/dop251/goja/commit/0a0a0d8cb944b7b4952aba3b4bd1a399b06e66a1, so we can fix the issue on our side

na-- on 8 Jan 2020

Hey @na--, could you take a look at my WIP branch for this and let me know if it matches what you had in mind? Most of the implementation is taken verbatim from your examples, so I'm hoping it does. :)

The refactor fixes the context cancelled message by calling Goja's Runtime.ClearInterrupt() in VU.Activate(), which could also have been fixed without the refactor, but this ensures it's only called once on VU activation.

I'm currently working on the tests and have a couple leftover issues I might need your help with, but that can wait for the PR. Just wanted to make sure I'm on the right track for now.

imiric on 17 Mar 2020

👍1

This was fixed in #1368.

imiric on 30 Apr 2020

K6: Context cancelled message shown when iteration context done

All 16 comments

Related issues