Short description of the issue:
100% reproducible deadlock using two share(replay: 1) operators.
Every instance of ShareReplay1WhileConnected has its own RecursiveLock and uses it for:
The problem is in the subscribe part because the cached element is sent to the observer while still holding the lock. Calling an _alien_ method while still holding the lock is dangerous. The _alien_ method might acquire other locks (which happened in this case) or block for an unexpectedly long time, stalling other threads that need the lock. Because of that, it is easy to deadlock two share() operators.
Since you already dispatch events without the lock in on(_ event: Event<E>) I assume it is also fine to dispatch the cached element when _synchronized_subscribe() finishes and the lock gets released.
Suggestion (if you agree with this suggestion I would be happy to make a pull request which fixes the deadlock):
ShareReplay1WhileConnected
override func subscribe<O: ObserverType>(_ observer: O) -> Disposable where O.E == E {
self._lock.lock()
let connection = self._synchronized_subscribe(observer)
let count = connection._observers.count
let disposable = connection._synchronized_subscribe(observer) // REMOVE SENDING THE ELEMENT FROM HERE
self._lock.unlock()
if let element = connection._element { // AND SEND IT HERE
observer.on(.next(element))
}
if count == 0 {
connection.connect()
}
return disposable
}
Expected outcome:
Both values printed out to the console.
2019-05-26 21:23:17.602: Share1 -> subscribed
2019-05-26 21:23:17.567: Share2 -> subscribed
2019-05-26 21:23:17.607: Share1 -> Event next(value1)
2019-05-26 21:23:17.607: Share2 -> Event next(value2)
2019-05-26 21:23:17.607: Share1 INNER -> subscribed
2019-05-26 21:23:17.607: Share2 INNER -> subscribed
2019-05-26 21:23:17.607: Share1 INNER -> Event next(value2)
2019-05-26 21:23:17.607: Share2 INNER -> Event next(value1)
Subscribe: next(value2)
Subscribe: next(value1)
What actually happens:
Two threads deadlock and nothing gets printed out to the console.
2019-05-26 21:22:26.069: Share2 -> subscribed
2019-05-26 21:22:26.069: Share1 -> subscribed
2019-05-26 21:22:26.071: Share1 -> Event next(value1)
2019-05-26 21:22:26.071: Share2 -> Event next(value2)
2019-05-26 21:22:26.072: Share1 INNER -> subscribed
2019-05-26 21:22:26.072: Share2 INNER -> subscribed
Self contained code example that reproduces the issue:
We are working on a large reactive system in which we have many computations and try to share as much as we can between Rx streams. This is a simplified version of what happened in our codebase.
let share1 = BehaviorSubject.init(value: "value1").share(replay: 1)
let share2 = BehaviorSubject.init(value: "value2").share(replay: 1)
share1.subscribe().disposed(by: disposeBag)
share2.subscribe().disposed(by: disposeBag)
Observable.merge(
share1
.debug("Share1")
.subscribeOn(ConcurrentDispatchQueueScheduler(qos: .background))
.flatMapLatest { _ in share2.debug("Share1 INNER") }
,
share2
.debug("Share2")
.subscribeOn(ConcurrentDispatchQueueScheduler(qos: .background))
.flatMapLatest { _ in share1.debug("Share2 INNER") }
)
.subscribe { print("Subscribe: \($0)") }
.disposed(by: disposeBag)
RxSwift version
4.5.0
Platform/Environment
How easy is to reproduce? (chances of successful reproduce after running the self contained code)
Xcode version:
10.1
Installation method:
I have multiple versions of Xcode installed:
(so we can know if this is a potential cause of your issue)
Level of RxSwift knowledge:
(this is so we can understand your level of knowledge
and formulate the response in an appropriate manner)
Hi @PetarMarijanovic ,
Long time no see, hope you and your family are doing well.
I guess this Monday turned out ok 馃槅. According to your posting time looks like you've had an even better Sunday 馃槅.
I guess there is a lot of questions here, so I'll try to address them one by one.
100% reproducible deadlock using two share(replay: 1) operators.
2019-05-27 12:53:08.946: Share2 -> subscribed
2019-05-27 12:53:08.946: Share1 -> subscribed
2019-05-27 12:53:08.947: Share2 -> Event next(value2)
2019-05-27 12:53:08.947: Share1 -> Event next(value1)
2019-05-27 12:53:08.947: Share2 INNER -> subscribed
2019-05-27 12:53:08.947: Share1 INNER -> subscribed
I can't repro it on my machine, but I understand that that code might have deadlock issues, so I trust you when you say that it deadlocks for you.
The problem is in the subscribe part because the cached element is sent to the observer while still holding the lock. Calling an alien method while still holding the lock is dangerous. The alien method might acquire other locks (which happened in this case) or block for an unexpectedly long time, stalling other threads that need the lock. Because of that, it is easy to deadlock two share() operators.
I see you've marked where something is being sent and where do you want it to be sent. Life isn't so simple unfortunately.
The problem with your suggestion is that you would then not synchronise upstream producer of the events with sending of the initial element. That would mean that the order might be invalid or the elements of the sequence could overlap.
RxSwift is usually used to model unidirectional systems. What you have is a circular dependency between four stateful entities that all emit initial values on concurrent background schedulers.
The four stateful systems are two behaviour subjects and two shareReplay operators.
In theory you can deadlock it in a simpler way by creating a small circle between two BehaviorSubject probably, although repro depends on timing.
Even if we somehow changed the code in this operator to not deadlock for your use case, you are asking for trouble with that design of your system because all other operators have the same assumptions of dealing with unidirectional systems.
E.g. Every operator that merges >= 2 sequences sends elements while holding the lock because it needs to synchronise between potentially >= 2 independent processes.
Ok, so we are we doing that? The majority of systems out there are unidirectional or can be modelled by parts as unidirectional. Supporting weird circular dependencies has significant performance penalties.
If we wanted to support these kinds of circular flows we would need to replace every single sending of elements with an explicit queue and some equivalent of AsyncLock (https://github.com/ReactiveX/RxSwift/blob/65efb404c80fe1b23108c8fbb57d05b615e21c93/RxSwift/Concurrency/AsyncLock.swift). We are using AsyncLock in a couple of places for more complex operators and it is working as you say 'not calling an alien method while holding a lock'.
It seemed to us that other implementations have the same architecture and assumptions, with the one exception here being RxJava. It seems to me that RxJava does support these cyclic dependencies.
The problem with Swift is that it is much slower than Java. Sure, you'll read some synthetic benchmarks how some non idiomatic Swift is super fast. Unfortunately I couldn't confirm that with real word cases. There are also some other tricks RxJava could use. They can atomically replace references. We've tried to play with those concepts but even the small amount of atomic code we had in the codebase had to go away because of T-San and the fact we are using value types in Swift which you can't just AtomicCompareAndSwap.
So anyway, this is where are we are today and why.
Can we improve some things? Perhaps, we can maybe look into improving subject behaviours. Hard to tell, but it won't be bulletproof.
So what can you do right now?
Using .subscribeOn with a concurrent scheduler after the subject is probably the worst thing you can do and it probably doesn't do what you want. You probably wanted to perform work in the background in the flatMapLatest. You can accomplish that with observeOn and not subscribeOn. I think that this alone should solve your issues regardless of using concurrent schedulers.
Figure out what unidirectional flows you have and then model them. If you have some kind of weird circular dependencies you might want to reconsider how to model them in a simpler way. A simple way to solve this is to usually pull the state from all of the circular dependencies and put it into one container/BehaviorSubject. That's probably what you had anyway.
Use something like RxFeedback where you have a state and a set of events produced based on that state and only unidirectional flows.
I'm not sure does this help, but I hope I've illustrated that the problem is not that simple as you might have originally thought :) I guess it's a no brainer if we dismiss performance and just replace everything with AsyncLocks.
Hi @kzaher,
we are doing even better than well :) Same goes for you, I really hope you and your family are well and happy.
Reproducibility:
That's odd. I managed to reproduce it on two different laptops running simulators. Nevertheless, I'm glad you agree a deadlock could occur, so I don't need to investigate why isn't it working for you :)
My suggestion:
Yes, I know the solution won't be as simple as I portrayed it. I wanted to make sure you get the point where I think the problem is, and that you will think of the best solution possible.
Code example:
Ignore the BehaviourSubjects. I just wanted a stream of data that doesn't complete, and we actually don't have BehaviorSubjects at the start of the chain (at least not in the part in which the deadlock occurred). But I agree for the two stateful systems which are the two shareReplay operators and their circular dependency. It makes sense to model the whole system as a unidirectional flow of data, but as you said: "Life isn't so simple unfortunately". We are dealing with a lot of UseCases, and some of them have the shareReplay logic, so we don't compute the same thing over and over again, and because of that we accidentally created a circular dependency between two of them.
Recap:
It wasn't so obvious and easy to find, and while I agree with you that fixing this can introduce either other threading issues or performance penalties, I'm really sorry we can't fix this. A larger system in production with some unfortunate timings and buried circular dependencies between UseCases could deadlock, which will be almost impossible to debug and fix from logs.
We will resolve the circular dependency and take extra care in the future not to make the same mistake. Thank you for your time and this great response. It was really nice to learn from you again :)
Bonus question:
You said Using .subscribeOn with a concurrent scheduler after the subject is probably the worst thing you can do and it probably doesn't do what you want. I agree that using observeOn makes more sense (again as I said before, the code example is a stupid representation of the problem with which I managed to reproduce the deadlock). But, why is it the worst thing I can do? I want to make sure I don't miss something obvious.
This is how I see it:
_Lets say both examples are called on the main thread_
sharedObservable.observeOn(ConcurrentDispatchQueueScheduler(qos: .background))
sharedObservable will be executed on the main threadsharedObservable.subscribeOn(ConcurrentDispatchQueueScheduler(qos: .background))
sharedObservable will be executed on the background threadYes, I know the solution won't be as simple as I portrayed it. I wanted to make sure you get the point where I think the problem is, and that you will think of the best solution possible.
I'm thinking about this for ages. Yes, I agree with what almost all you've said. Ideally we would better support circular dependencies, but right now I don't see a simple way. I personally don't have this problem since I've solved it on a higher level of system design, but it's not the most satisfying feeling.
Even if we decided to do async locks everywhere that would introduce additional problems like, if you are pumping something really fast on some background thread, your UI thread could be infinitely blocked because it could inadvertently become worked for that async lock just because it came first. The predictability of the system would decrease because the source of your event could be far gone by the time processing errors out, etc. 馃槱
You said Using .subscribeOn with a concurrent scheduler after the subject is probably the worst
thing you can do and it probably doesn't do what you want. I agree that using observeOn makes more sense (again as I said before, the code example is a stupid representation of the problem with which I managed to reproduce the deadlock). But, why is it the worst thing I can do? I want to make sure I don't miss something obvious.
The thing to note here is that the subscribe logic is critical here.
If you just use observeOn then all subscription and initial emitting will happen during the normal subscription on the main thread like you've said. The initial element will be then switched to some other worked thread in the background, but the subscription caller won't be blocked and wait on some lock. Nothing here will cause issues.
If you are using subscribeOn then in this case you are firing two subscribe requests in a really short time span, sending them off to two different backgrounds threads that are both racing towards circular setup of locks. This is probably a setup for disaster :)
Hello guys!
I believe I stumbled on the same issue. :)
We use RxFeedback & MVVM but circular dependency appear deeper in service/repo layer which are part of separate framework used in more than one project.
@PetarMarijanovic how do you protect the codebase from circular dependency? I guess if you work alone on the project, chances are decreased that circular dependency will appear, but that's not the case I guess.
@kzaher thanks for the comprehensive explanation ;)
Hey @IvanKalaica :)
As @kzaher proposed, we've taken a step back and managed to get around the issue by defining clear dependency rules between objects. These rules don't allow circular dependencies by nature, and we try not to violate them.
And I am not alone, there are 8 iOS devs currently working on the project. I'm just the Android guy who helped out a bit :)
Most helpful comment
Hey @IvanKalaica :)
As @kzaher proposed, we've taken a step back and managed to get around the issue by defining clear dependency rules between objects. These rules don't allow circular dependencies by nature, and we try not to violate them.
And I am not alone, there are 8 iOS devs currently working on the project. I'm just the Android guy who helped out a bit :)