For this function in ApolloClient,
@discardableResult public func fetch
cachePolicy: CachePolicy = .returnCacheDataElseFetch,
context: UnsafeMutableRawPointer? = nil,
queue: DispatchQueue = DispatchQueue.main,
resultHandler: GraphQLResultHandler
[Bug] 0.27.1 introduced a bug when calling the function above, one after another, with different queries. One of the two queries would not call the "resultHandler" callback. Sometimes the one that does call the callback would come back with this error: dataForRequestNotFound.
This bug does not reproduce in 0.24.1. I will check whether it reproduces in 0.25.1 and report back.
@MaximillionMai Can you confirm which version this breaks on for you? That will help me do a comparison between that version and the previous version to figure out what might have changed here.
Hi @designatednerd! We're experiencing the same issue. Testing shows that it reproduces in 0.27.1 and does NOT reproduce in 0.24.1 and 0.25.1.
I have just noticed this too, in v0.27.1. In my case it's when calling watch(...) simultaneously for two different queries. I receive a dataForRequestNotFound error occasionally for one of the queries.
We've been seeing this occasionally also, have not had time to look into it with more depth as we were assuming it was the dogshit wifi we have currently. Just a series of normal fetches in our case
Looks like the problem is here - the data object that's been dequeued for the task which has just finished is nil, which is weird, since we create an empty data object for each task before we start it.
This was introduced in 0.27.0 so anything prior to that will not be affected.
Thought: Are any of you calling cancel directly on the Cancelable returned from URLSessionClient?
Nope! The only place I reliably see it is when I'm calling a series of fetch requests one after another, at one point I kick off a request in parallel for each of the objects received, but the request that fails isn't always one of those.
@sneakyness Got a rough guess on how many requests at once you've got going before you see this issue?
Also, that error has an associated value that should have the originalRequest from the task that we couldn't pull data for - are you able to tell if there's any pattern to what's failing vs what's working?
2-3 completed (each one must finish before the next can start), then 1+ fetch for one type while also making a singular request of another. The 1+ is commonly 4-10+, making more does not seem to change how sporadic it is. I'll start logging that now but IME it's always either of the last two
Thanks. Woof, sporadic bugs are my faaaavorite to troubleshoot π
Unfortunately I'm in the middle of an API Model refactor, if I'm lucky I should be able to automate firing this process over and over to try and reproduce by end of week
Just pushed up a branch where I added a test that checks if 300 identical URL requests hitting the client at the same time (from different threads) create different IDs, and it passes.
Throwing some breakpoints in and doing some po's, it looks like it does create one instance per ID of the data in the self.datas property for URLSessionClient, and none of them are nil. That puts a kibosh on my theory of "Maybe different requests are winding up with the same ID if they're being sent at the same time, causing the data to be deallocated when the first request finishes."
The only place the data for a particular task ID gets removed is either in clearTask or clearAllTasks - clearAllTasks only gets hit when the session invalidates, which would give you a different error. clearTask only gets hit either a) If you cancel via URLSessionClient's cancel(task:) method or b) If the task actually completes.
I've added this level of detail because I'm basically stumped. Clearly, something's causing the data for a particular URLSessionTask identifier key to get nulled out. What the hell is doing that? If anyone's got thoughts, would love to hear them.
I'm using DispatchQueue.concurrentPerform, not super sure on how different that is from .userInitiated.
I'm not doing this specifically for the network requests, there's a decent amount of objects being initialized and sorted, just happened to be a clean spot to snag an id I needed for the request.
Actually I've just realized that it might get called multiple times per index, do you think that's causing it? π€‘
Actually I've just realized that it might get called multiple times per index
I'm not sure what you mean by this
If it's parallel can't it rarely call the same index twice? Especially in cases where the number of tasks is >8 (afaik 8 is the max number of threads it'll kick off)
I still am missing something - what is "it" that could be calling the same index multiple times in this context - do you mean the URLSession creating tasks with the same identifier, or something else?
concurrentPerform, so it might be calling the same fetch multiple times, but that doesn't explain why it would fail with "dataForRequestNotFound" instead of eventually just succeeding
Ah! That makes more sense. I updated the test to use concurrentPerform and yep, multiple tasks with the same ID are getting generated.
(basically if multiple tasks with the same ID are getting generated, then my "I will unique this based on the task identifier" strategy falls apart entirely)
My case may be different from others in that the first request was fine. If we know for sure this is the cause, I can do whatever to make sure it's only called once.
If there was an implementation of fetch/watch/etc like fetchOnce that either returned a request if there's no task ID, or did nothing/emit something that said this was the case, would that be a good bandaid for the others' cases pending a bigger change?
The first request would probably be fine in what you were describing because it's not trying to make multiple requests at the same time, it's once you start making multiple concurrent requests that the ID incrementor fails. Kind of wish they'd gone for a UUID instead of an auto-incrementing int for that identifier...
I'll think on that - for now I need to look at what to do about the fact that the ID isn't unique if it's hit concurrently repeatedly, because I thought Apple was handling that and they ain't. π
πCan I ask a couple of you folks to try out the fix/multifetch branch to see if the changes there make a difference? I've updated the dictionary to key off the URLSessionTask rather than its identifier, so this should stop the issue where something gets ripped out of the dictionary too early. The tests are passing but I'd love to get confirmation this is working in the real world before I PR it.
Just tested the fix/multifetch branch. Unfortunately the issue still persists. In our case we just make two identical serial calls to fetch function on the same queue, and for one of these invocations resultHandler is not called back. It happens roughly 4/5 times, sometimes it works as expected.
this sounds like completion handler overwritten somewhere without cancelling and new call created without canceling old one
@RolandasRazma That's definitely what was happening when using the task identifier as the key, since the task identifier would not get incremented if the session was getting hit quickly enough, and everything for the particular key would get hammered by the next task.
@yuriiik - Are you making other calls at the same time, or just the two identical calls? Also, thank you for testing, it's a bit frustrating that the concurrentPerform test didn't uncover this π
@designatednerd β just two identical calls, one after another.
The var is different in mine but otherwise they are the same call. Applied branch, saw the bug once, set a breakpoint, never saw it again π»
The process of triggering the request was streamlined, I was able to run it on repeat long enough to forget about it and still got nothing!
@sneakyness Did it continue to work fine after you removed the breakpoint? I'm thinking it might be a Heisenbug (in that it changes when observed, probably because the breakpoint slows things up enough to stop whatever was happening concurrently before) if it goes back to not working. If it randomly broke once and then worked forevermore, I have no goddamned idea π
@yuriiik Something occurred to me - can you confirm that you're not using Query Watchers? There's a bug in the current version where we're super-agressively cancelling in-flight fetches if a cache result comes in that's fixed with #1156 (which is merged but not shipped). If you are using query watchers, do you mind pulling master and letting me know if that fixes the issue?
That lines up with what I figured was going on earlier, the only time it was happening was either after the backend had been rebuilt and the query did a cold run, or if we were in the "best worst zone" of wifi coverage which caused a similar amount of delay.
This weekend I've got some nebulous and grand refactoring planned for the backend, I suspect once the schema tidies up this will disappear. If I manage to cause it again after that I will let you know. I'm inclined to say that this error is a "smell" and that ultimately it's best to find a way to get the result you want with less requests, even if it shouldn't happen
@designatednerd Sorry to get back to you late! It was indeed introduced in version 0.27.1.
For me, it repros when I call two different query calls one right after another.
@designatednerd we're not using Query Watchers, so that's not the case I think.
OK, I think I got it. Basically: Concurrency is hard. π
Please see #1227.
0.28.0. If you are still experiencing any problems around this, please open a new issue indicating what version you are using.Thanks for uncovering this!
We can no longer reproduce the issue on 0.28.0. @designatednerd thank you so much!
Most helpful comment
We can no longer reproduce the issue on
0.28.0. @designatednerd thank you so much!