Realm-cocoa: IncorrectThreadException when using objectForPrimaryKey

Created on 19 Dec 2016 · 12Comments · Source: realm/realm-cocoa

Goals

I want to be able to call objectForPrimaryKey: on a background thread and not have the app crash. This occurs very infrequently. I left my app running for an hour with a background update every 3 minutes which does quite a few Realm transactions.

Expected Results

I expected the app to not crash when using objectForPrimaryKey:.

Actual Results

It crashed. Here's the relevant part of the stack trace:

#0  0x0000000110690c3e in _dispatch_once(long*, void () block_pointer) [inlined] at /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator10.0.sdk/usr/include/dispatch/once.h:74
#1  0x0000000110690c3e in ::__cxa_throw(void *, std::type_info *, void (*)(void *)) at /Users/benny/Repositories/MS/HockeySDK-iOS/Classes/BITCrashCXXExceptionHandler.mm:80
#2  0x00000001102be4c3 in realm::Realm::verify_thread() const at /Users/schapman/Workspace/project/Pods/Realm/Realm/ObjectStore/src/shared_realm.cpp:423
#3  0x000000011026676e in ::-[RLMRealm verifyThread]() at /Users/schapman/Workspace/project/Pods/Realm/Realm/RLMRealm.mm:144
#4  0x000000011014e453 in RLMVerifyRealmRead(RLMRealm*) at /Users/schapman/Workspace/project/Pods/Realm/Realm/RLMObjectStore.mm:59
#5  0x000000011014e4d7 in ::RLMGetObject(RLMRealm *, NSString *, id) at /Users/schapman/Workspace/project/Pods/Realm/Realm/RLMObjectStore.mm:485
#6  0x000000011013a981 in ::+[RLMObject objectForPrimaryKey:](id) at /Users/schapman/Workspace/project/Pods/Realm/Realm/RLMObject.mm:138
#7  0x000000010fe3c137 in __64-[MyClass myMethod]_block_invoke.55 at .../MyClass.m:93

The offending line is: MyRealmClass* object = [MyRealmClass objectForPrimaryKey:@"primary_key"];

I'm not accessing the realm directly. It appears that the RLMRealm's defaultRealm call is returning the incorrect realm for the given thread.

I've done a bit of initial investigation when the debugger was attached which this happened:

Within the context of the RLMRealm object note that self and the response from RLMRealm.defaultRealm are identical, however, the m_thread_id and pthread_self() return different thread IDs.

(lldb) po self
<RLMRealm: 0x6080002ba100>
(lldb) po RLMRealm.defaultRealm
<RLMRealm: 0x6080002ba100>

(lldb) print (__thread_id)m_thread_id
(std::__1::__thread_id) $27 = {
  __id_ = 0x00007000002a0000
}
(lldb) print (__thread_id)pthread_self()
(std::__1::__thread_id) $28 = {
  __id_ = 0x00007000005b2000
}

Steps to Reproduce

Not sure how to reproduce. It seems like the defaultRealm call is returning a RLMRealm instance for the wrong thread. Maybe the thread local caching isn't working as expected? I don't really know enough how how the defaultRealm access works to give an informed use case.

Code Sample

I don't have an available sample. My code is executing this code on a dispatch_queue_t which was created with this: _syncQueue = dispatch_queue_create("Sync Queue", DISPATCH_QUEUE_CONCURRENT);

I use many calls to this same queue simultaneously to parse HTTP JSON response into Realm objects in parallel.

Version of Realm and Tooling

Realm version:

Realm (2.1.0):
- Realm/Headers (= 2.1.0)
Realm/Headers (2.1.0)

Xcode version: Version 8.1 (8B62)

iOS/OSX version: iOS 10.1 iPad Air 2 simulator

Dependency manager + version: Cocoapods 1.1.1

Questions

Would manually catching this exception and calling RLMClearRealmCache be a suitable work around?

T-Bug-Crash

Source

SandyChapman

Most helpful comment

@jpsim : Since this is using a concurrent GCD queue it's pretty much guaranteed that it will definitely be running on a different thread if I make several simultaneous dispatches on the queue. That's why I don't store RLMRealm or RLMObject instances and am attempting to use the class method objectWithPrimaryKey: to re-retrieve my instance within the context of the thread it's running on. This should be thread-safe.

@tgoyne : the next time I catch this happening, I'll gather the debug info you're asking for. I've validated that the thread ID on the RLMRealm instance doesn't match the current thread (which makes sense as that's why the exception is thrown), but I'll double check the coordinator next time. Hopefully, it's just a matter of leaving my app running for an hour or two like last time I saw this happen.

SandyChapman on 20 Dec 2016

👍2

All 12 comments

Hi @SandyChapman!

Sorry to hear that you're bumping into a potential threaded issue with [RLMRealm defaultRealm].

@tgoyne: Do you have any idea how this could potentially be happening?

Thanks!

TimOliver on 19 Dec 2016

Keep in mind that a GCD queue doesn't always guarantee to operate on the same thread. If you're making that assumption, it could manifest itself as this exception.

It's hard to say without the relevant code being shared...

jpsim on 19 Dec 2016

The cache does store its own copy of the threadid for the Realm, so I suppose it's theoretically possible that there could be a mismatch between the threadid there and the value stored in m_thread_id. You could investigate if this is the case by printing _realm->m_coordinator->m_weak_realm_notifiers and checking that the entry for your local Realm instance has the same threadid as the instance itself.

If that does turn out to be the problem, I'm still not really sure how it would happen; the cache management code is all pretty simple (and there's not much of it).

tgoyne on 19 Dec 2016

SandyChapman on 20 Dec 2016

👍2

Thanks @SandyChapman! If you could collect that debug data for us, that would be really helpful. Please let us know when you're able to collect it. :)

Happy holidays!

TimOliver on 24 Dec 2016

@TimOliver : I left my app running for a while on Friday with no crash happening. I'll turn up the frequency of the REST calls which resulted in this crash and try again when I'm back at work in the new year. Will follow up with any progress I make.

SandyChapman on 24 Dec 2016

@TimOliver @tgoyne
Our QA just reproduced this bug. Unfortunately since it happened outside of a debug session, I have no further details except the crash log. At least it shows there's some reproducibility to the problem.

Incident Identifier: D2450BE4-5302-418E-9C45-B045EF809254
CrashReporter Key:   4D6BB64F-9973-4866-8605-B7C44B3B7E16
Hardware Model:      iPad5,3
Process:         My App [274]
Path:            /var/containers/Bundle/Application/B87D9196-0284-49BF-9E30-28EF80D15F66/My App.app/My App
Identifier:      my.app
Version:         1.1.0 (1.13.19)
Code Type:       ARM-64
Parent Process:  ??? [1]

Date/Time:       2016-12-30T17:45:27Z
Launch Time:     2016-12-27T17:28:30Z
OS Version:      iPhone OS 10.2 (14C92)
Report Version:  104

Exception Type:  SIGABRT
Exception Codes: #0 at 0x19170f014
Crashed Thread:  7

Application Specific Information:
*** Terminating app due to uncaught exception 'realm::IncorrectThreadException', reason: 'Realm accessed from incorrect thread.'

Last Exception Backtrace:
0   My App                             0x00000001002667e8 realm::Realm::verify_thread() const (shared_realm.cpp:433)
1   My App                             0x00000001001e338c RLMGetObject (RLMObjectStore.mm:59)
2   My App                             0x00000001001da15c +[RLMObject objectForPrimaryKey:] (RLMObject.mm:138)
...

SandyChapman on 30 Dec 2016

@tgoyne, @TimOliver, @jpsim : I've hit this once more with the debugger attached. Here's some additional details:

_ream->m_coordinator->m_weak_ream_notifiers has 9 items. This sounds reasonable as I am running many background threads concurrently.

Some print statements:

(lldb) print (__thread_id)pthread_self()
(std::__1::__thread_id) $0 = {
  __id_ = 0x000070000021d000
}

# Printing element 7's m_thread_id:
Printing description of ((pthread_t)0x70000021d000):
(pthread_t) __id_ = 0x000070000021d000

Printing description of self->_realm.__ptr_->m_thread_id.__id_:
(pthread_t) __id_ = 0x000070000052f000

# Printing element 5's m_thread_id
Printing description of ((pthread_t)0x70000052f000):
(pthread_t) __id_ = 0x000070000052f000

So this seems to confirm that we're getting the wrong Realm for the given thread.

Stack trace:

#0  0x000000010f8199ae in _dispatch_once(long*, void () block_pointer) [inlined] at /Applications/Xcode-Archiv/Xcode80.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator10.0.sdk/usr/include/dispatch/once.h:74
#1  0x000000010f8199ae in ::__cxa_throw(void *, std::type_info *, void (*)(void *)) at /Users/benny/Repositories/MS/HockeySDK-iOS/Classes/BITCrashCXXExceptionHandler.mm:80
#2  0x000000010f43ff73 in realm::Realm::verify_thread() const at ../project/Pods/Realm/Realm/ObjectStore/src/shared_realm.cpp:433
#3  0x000000010f3e805e in ::-[RLMRealm verifyThread]() at ../project/Pods/Realm/Realm/RLMRealm.mm:144
#4  0x000000010f2cc763 in RLMVerifyRealmRead(RLMRealm*) at ../project/Pods/Realm/Realm/RLMObjectStore.mm:59
#5  0x000000010f2cc117 in ::RLMGetObjects(RLMRealm *, NSString *, NSPredicate *) at ../project/Pods/Realm/Realm/RLMObjectStore.mm:464
#6  0x000000010f2b7d44 in ::+[RLMObject allObjects]() at ../project/Pods/Realm/Realm/RLMObject.mm:98
...

Also note that this is happening in allObjects now instead of just objectForPrimaryKey:. Reading the pthread_self documentation here, it states:

Thread IDs are guaranteed to be unique only within a process. A
thread ID may be reused after a terminated thread has been joined, or
a detached thread has terminated.

Is it possible that we're just hitting a collision and the static s_realmsPerPath isn't being cleaned up properly when a realm is no longer needed? I have 76 items in the single value in that map, which is certainly more than the 9 notifiers in the m_coordinator.

According to Wolfram Alpha, there's roughly a 4.25% chance of collision with 76 items bounded to the range 65536 (which the thread IDs appear to be). Could this be the problem here?

I'm guessing we could add some validation logic to RLMGetThreadLocalCachedRealmForPath to validate that the Realm retrieved from s_realmsPerPath has the correct thread ID and remove it from the cache and return null if it doesn't. I'm not sure if this is the best solution, but it's at least one that I see as non-intrusive and I think will work

Let me know if you guys have any other ideas. I'm willing to put the validation fix in if you think it's suitable.

SandyChapman on 4 Jan 2017

There could be an issue with threadid reuse related to that we get the thread id in two different ways (pthread_mach_thread_np() for the cache and std::this_thread::get_id() for verifying the thread), and if the former happens to reuse IDs when the latter doesn't you'd get the problem you're encountering. I'll investigate if that is possibly the case.

tgoyne on 4 Jan 2017

Thread id reuse does appear to be the problem, as std::this_thread::get_id() actually just returns the pthread_t and not the thread id at all, that'll happen to return a reused value at completely different times. Should be simple to fix.

tgoyne on 4 Jan 2017

Thanks @tgoyne : when should we expect this to get into a release? I'll try to confirm the fix once it's ready.

SandyChapman on 6 Jan 2017

We'll be making a release next week, but until then you could build from source to incorporate this fix.

jpsim on 6 Jan 2017

Was this page helpful?

0 / 5 - 0 ratings