Hello,
We're experiencing high incidents of SIGSEGV SEGV_MAPERR crashes on recent versions of Realm. We can reliably reproduce these crashes by opening realms on multiple threads and conducting operations. Crashing multiple times (1-5) will usually cause the database to corrupt. Most of our tests have been with encrypted realms, however, I was able to reproduce the same results using an unencrypted realm with much effort.
This is a small set of crashes from production. I have hundreds, but they're much of the same.
signal 11 (SIGSEGV), code 1, fault addr 0x674db176 in tid 27384 (m.messenger.app)
Revision: '0'
ABI: 'arm'
pid: 27384, tid: 27384, name: m.messenger.app >>> com.messenger.app <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x674db176
r0 674db172 r1 b7989f78 r2 00000861 r3 0000004e
r4 676e6172 r5 b7900f90 r6 00000068 r7 bef3c350
r8 00000001 r9 bef3c37c sl 9ec858c0 fp b7fd5e78
ip b6cb05dc sp bef3c348 lr b3b7152d pc b3b5e6c2 cpsr 800e0030
Stack Trace:
RELADDR FUNCTION FILE:LINE
000846c2 realm::BpTreeNode::get_bptree_leaf(unsigned int) const+82 unwind-c.c:?
00085b0b realm::BpTree<realm::util::Optional<long long> >::get(unsigned int) const+20 unwind-c.c:?
0008da9f realm::TimestampColumn::get(unsigned int) const+14 unwind-c.c:?
000babe9 realm::TimestampNode<realm::Equal>::find_first_local(unsigned int, unsigned int)+64 unwind-c.c:?
000c3215 realm::ParentNode::find_first(unsigned int, unsigned int)+44 unwind-c.c:?
000c940f realm::Query::count(unsigned int, unsigned int, unsigned int) const+166 unwind-c.c:?
0003cd51 Java_io_realm_internal_TableQuery_nativeCount+88 unwind-c.c:?
010092f3 offset 0xd10000 /data/app/com.messenger.app-1/oat/arm/base.odex
-----------------------------------------------------
signal 11 (SIGSEGV), code 1, fault addr 0x4 in tid 6685 (IncomingMessage)
Revision: '0'
ABI: 'arm'
pid: 6606, tid: 6685, name: IncomingMessage >>> com.messenger.app <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x4
r0 b80ccf60 r1 ffff97a0 r2 b80cd26c r3 00000000
r4 b80ccf60 r5 ffff97a0 r6 00000000 r7 00000037
r8 00000001 r9 00000000 sl 000e0000 fp b3d62ab0
ip 0000000c sp a0427cb0 lr b3c89e2b pc b3c40540 cpsr 600f0030
Stack Trace:
RELADDR FUNCTION FILE:LINE
00097540 realm::SlabAlloc::do_translate(unsigned int) const+448 unwind-c.c:?
000e0e29 realm::ArrayStringLong::init_from_mem(realm::MemRef)+178 unwind-c.c:?
000e0f5d realm::StringColumn::StringColumn(realm::Allocator&, unsigned int, bool, unsigned int)+288 unwind-c.c:?
000da4f1 realm::Table::refresh_column_accessors(unsigned int)+932 unwind-c.c:?
000a7f4b realm::Group::do_get_table(unsigned int, bool (*)(realm::Spec const&))+698 unwind-c.c:?
000da40b realm::Table::refresh_column_accessors(unsigned int)+702 unwind-c.c:?
000a7f4b realm::Group::do_get_table(unsigned int, bool (*)(realm::Spec const&))+698 unwind-c.c:?
000a9cd5 realm::Group::do_get_or_add_table(realm::StringData, bool (*)(realm::Spec const&), void (*)(realm::Table&), bool*)+52 unwind-c.c:?
000243ed Java_io_realm_internal_SharedRealm_nativeGetTable+172 unwind-c.c:?
00fff4e9 offset 0xd10000 /data/app/com.messenger.app-1/oat/arm/base.odex
-----------------------------------------------------
signal 11 (SIGSEGV), code 1, fault addr 0xb766dfbf in tid 6722 (IncomingMessage)
Revision: '0'
ABI: 'arm'
pid: 6695, tid: 6722, name: IncomingMessage >>> com.messenger.app <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xb766dfbf
r0 b766dfc0 r1 b766dfbf r2 ffcbf0af r3 0000001c
r4 a027ad10 r5 00000114 r6 00000000 r7 00015b48
r8 00015b48 r9 00000001 sl 00000001 fp 00000000
ip e0000000 sp a027a998 lr b3c03539 pc b6cb24ac cpsr a00d0010
Stack Trace:
RELADDR FUNCTION FILE:LINE
000184ac memmove+444 bionic/libc/arch-arm/denver/bionic/memmove.S:210
000e1535 realm::ArrayBlob::replace(unsigned int, unsigned int, char const*, unsigned int, bool)+640 unwind-c.c:?
000e2745 realm::ArrayStringLong::bptree_leaf_insert(unsigned int, realm::StringData, realm::TreeInsertBase&)+68 unwind-c.c:?
000de4e5 realm::StringColumn::leaf_insert(realm::MemRef, realm::ArrayParent&, unsigned int, realm::Allocator&, unsigned int, realm::BpTreeNode::TreeInsert<realm::StringColumn>&)+1052 unwind-c.c:?
000de897 unsigned int realm::BpTreeNode::bptree_append<realm::StringColumn>(realm::BpTreeNode::TreeInsert<realm::StringColumn>&)+94 unwind-c.c:?
000dea9d realm::StringColumn::bptree_insert(unsigned int, realm::StringData, unsigned int)+240 unwind-c.c:?
000cb343 realm::StringColumn::insert_rows(unsigned int, unsigned int, unsigned int, bool)+54 unwind-c.c:?
000cc28d realm::Table::insert_empty_row(unsigned int, unsigned int)+64 unwind-c.c:?
000269a9 Java_io_realm_internal_Table_nativeAddEmptyRow+164 unwind-c.c:?
010012c7 offset 0xd10000
-----------------------------------------------------
signal 11 (SIGSEGV), code 1, fault addr 0x20 in tid 28460 (m.messenger.app)
Revision: '0'
ABI: 'arm'
pid: 28460, tid: 28460, name: m.messenger.app >>> com.messenger.app <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x20
r0 b79431f0 r1 00000008 r2 b3818008 r3 00000000
r4 b79431f0 r5 b77efef8 r6 00000008 r7 b79431f0
r8 00000020 r9 b79431f0 sl 00000008 fp 00000000
ip b6ca45dc sp bedcfa40 lr b3b9bca3 pc b3b9a0fa cpsr 40070030
Stack Trace:
RELADDR FUNCTION FILE:LINE
000cc0fa realm::Table::get_column_base(unsigned int)+18 unwind-c.c:?
000cdc9f realm::Table::connect_opposite_link_columns(unsigned int, realm::Table&, unsigned int)+10 unwind-c.c:?
000da4dd realm::Table::refresh_column_accessors(unsigned int)+912 unwind-c.c:?
000a7f4b realm::Group::do_get_table(unsigned int, bool (*)(realm::Spec const&))+698 unwind-c.c:?
0005bb9b realm::ObjectSchema::ObjectSchema(realm::Group const&, realm::StringData, unsigned int)+810 unwind-c.c:?
00061641 realm::ObjectStore::schema_from_group(realm::Group const&)+160 unwind-c.c:?
0006a445 realm::Realm::init(std::shared_ptr<realm::_impl::RealmCoordinator>)+268 unwind-c.c:?
0006d9fb realm::_impl::RealmCoordinator::get_realm(realm::Realm::Config)+542 unwind-c.c:?
00068eb3 realm::Realm::get_shared_realm(realm::Realm::Config)+46 unwind-c.c:?
00024e95 Java_io_realm_internal_SharedRealm_nativeGetSharedRealm+224 unwind-c.c:?
00fff521 offset 0xd10000 /data/app/com.messenger.app-1/oat/arm/base.odex
-----------------------------------------------------
signal 11 (SIGSEGV), code 1, fault addr 0x41 in tid 6212 (m.messenger.app)
Revision: '0'
ABI: 'arm'
pid: 6212, tid: 6212, name: m.messenger.app >>> com.messenger.app <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x41
r0 b8b885a4 r1 00000001 r2 00000001 r3 00000000
r4 be9b6410 r5 b8b88598 r6 b3beeef7 r7 00000002
r8 00000001 r9 b88439a0 sl 00000000 fp 9ee2b9d0
ip b3bc76dd sp be9b63d0 lr b3c44863 pc b3b8f9d0 cpsr 200e0030
Stack Trace:
RELADDR FUNCTION FILE:LINE
0001f9d0 realm::BpTree<long long>::get(unsigned int) const+8 unwind-c.c:?
000d485f realm::StringData realm::Table::get<realm::StringData>(unsigned int, unsigned int) const+84 unwind-c.c:?
000d4897 realm::Table::get_string(unsigned int, unsigned int) const+4 unwind-c.c:?
00057701 Java_io_realm_internal_UncheckedRow_nativeGetString+36 unwind-c.c:?
00ff98bb offset 0xd10000 /data/app/com.messenger.app-2/oat/arm/base.odex
-----------------------------------------------------
signal 11 (SIGSEGV), code 1, fault addr 0xfffd7dfc in tid 11074 (RxComputationSc)
Revision: '0'
ABI: 'arm'
pid: 11041, tid: 11074, name: RxComputationSc >>> com.messenger.app <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xfffd7dfc
r0 b842d560 r1 9f3950d8 r2 00000001 r3 00000030
r4 fffd7df8 r5 b842d560 r6 00024000 r7 b814c4e8
r8 9f3950d8 r9 b83dca8c sl 1303d8b0 fp 00000001
ip 0000000c sp 9f3950c0 lr b3bfb147 pc b3bf9e4c cpsr 000f0030
Stack Trace:
RELADDR FUNCTION FILE:LINE
00086e4c realm::Array::init_from_mem(realm::MemRef)+14 unwind-c.c:?
00088143 realm::Array::update_from_parent(unsigned int)+62 unwind-c.c:?
000d48eb realm::Table::update_from_parent(unsigned int)+76 unwind-c.c:?
000a6bcb realm::SharedGroup::commit_and_continue_as_read()+218 unwind-c.c:?
00074af5 realm::_impl::transaction::commit(realm::SharedGroup&, realm::BindingContext*)+8 unwind-c.c:?
000685cd realm::Realm::commit_transaction()+32 unwind-c.c:?
0002302b Java_io_realm_internal_SharedRealm_nativeCommitTransaction+50 unwind-c.c:?
00fd6555 offset 0xd10000 /data/app/com.messenger.app-1/oat/arm/base.odex
-----------------------------------------------------
Encryption: On. Minimum 1 contact, 1 message, with body.length ~= 1000.
At least 1 write thread, and 1 other thread. Global instance N=2 will eventually crash, N=3+ will crash faster.
This code will often produce UTF-16 crash for us as well. We can provide the project if necessary, but this is the relevant code.
public class Contact extends RealmObject {
@PrimaryKey
private String id;
@Index
@Required
private String username;
private String number;
@Nullable
private RealmList<Message> mMessageList;
...
}
public class Message extends RealmObject {
@PrimaryKey
private String id;
@Index
private String contactId;
@Required
private String type;
private String body;
@Required
private Date dateCreated;
}
Thread mWriteThread = new Thread(new Runnable() {
@Override
public void run() {
final AtomicInteger i = new AtomicInteger(0);
while(mRunThreads) {
try (Realm realm = Realm.getInstance(REALM_CONFIG)) {
realm.executeTransaction(new Realm.Transaction() {
@Override
public void execute(Realm realm) {
Contact contact = realm.where(Contact.class).equalTo("id", CONTACT_ID).findFirst();
contact.setUsername("User" + i.incrementAndGet());
}
});
}
}
}
});
Thread mReadThread = new Thread(new Runnable() {
@Override
public void run() {
while(mRunThreads) {
try (Realm realm = Realm.getInstance(REALM_CONFIG)) {
try (Realm realm = Realm.getInstance(REALM_CONFIG)) {
realm.where(Contact.class).equalTo("id", CONTACT_ID).findAllSorted("username");
realm.copyFromRealm(realm.where(Contact.class).equalTo("id", CONTACT_ID).findAllSorted("username"),0);
}
}
}
}
});
Reducing global realm instances to N<=2 and reducing as much load on Realm as possible decreases the chances of crash and corruption.
Realm version(s): 2.2.1, 2.2.2, 2.3.0
Realm sync feature enabled: no
Encryption: yes
Android Studio version: 2.2.2
Which Android version and device: Android 6.0.1 (CM13) / CAF 6.0.1, OnePlus One / OnePlus X
This appears to be similar to issue reported here: https://github.com/realm/realm-core/issues/2383
How can we further debug and fix these issues?
Do you have any way to detect and/or recover from corruption? ie open realm in RO and verify integrity.
Hi @bios-seiji Thank you for a very detailed bug report 馃帀 . If you can send the project to [email protected] we would be extremely grateful.
Can you try to change the following to see if it helps stabilizing the issue?
Thread mWriteThread = new Thread(new Runnable() {
@Override
public void run() {
final AtomicInteger i = new AtomicInteger(0);
while(mRunThreads) {
try(Realm realm = Realm.getInstance(REALM_CONFIG)) {
realm.executeTransaction(new Realm.Transaction() {
@Override
public void execute(Realm realm) {
Contact contact = realm.where(Contact.class).equalTo("id", CONTACT_ID).findFirst();
contact.setUsername("User" + i.incrementAndGet());
}
});
}
}
}
});
Thread mReadThread = new Thread(new Runnable() {
@Override
public void run() {
while(mRunThreads) {
try(Realm realm = Realm.getInstance(REALM_CONFIG)) {
while(mRunThreads) {
try(Realm realm = Realm.getInstance(REALM_CONFIG)) {
realm.where(Contact.class).equalTo("id", CONTACT_ID).findAllSorted("username");
realm.copyFromRealm(realm.where(Contact.class).equalTo("id", CONTACT_ID).findAllSorted("username"), 0);
}
}
}
}
}
});
to
Thread mWriteThread = new Thread(new Runnable() {
@Override
public void run() {
final AtomicInteger i = new AtomicInteger(0);
try(Realm realm = Realm.getInstance(REALM_CONFIG)) {
while(mRunThreads) {
try(Realm realm = Realm.getInstance(REALM_CONFIG)) {
realm.executeTransaction(new Realm.Transaction() {
@Override
public void execute(Realm realm) {
Contact contact = realm.where(Contact.class).equalTo("id", CONTACT_ID).findFirst();
contact.setUsername("User" + i.incrementAndGet());
}
});
}
}
}
}
});
Thread mReadThread = new Thread(new Runnable() {
@Override
public void run() {
try(Realm realm = Realm.getInstance(REALM_CONFIG)) {
while(mRunThreads) {
try(Realm realm = Realm.getInstance(REALM_CONFIG)) {
realm.where(Contact.class).equalTo("id", CONTACT_ID).findAllSorted("username");
realm.copyFromRealm(realm.where(Contact.class).equalTo("id", CONTACT_ID).findAllSorted("username"), 0);
}
}
}
}
});
@Zhuinden apologies, that was a typo in my C&P (I corrected the original post). I have already been conducting tests with the change you suggest.
Here's the code we've been using to test load (and generate crashes):
https://github.com/bios-seiji/realm-crash/
@bios-seiji Thanks!
@kneth were you able to reproduce the crashes with the project I provided? Do you have any suggestions for further debugging or mitigation?
Once a user has a corrupted database, is there any way we can detect and triage? Currently, the application will try to run and crash unexpectedly and uncontrollably. At the least we should be able to detect corruption on boot and provide steps for the user to get re-setup.
Thanks
@bios-seiji Sorry for not getting back earlier. The code described above is similar to the code found in #4114. Which isn't surprising as #4114 was written with https://github.com/realm/realm-core/issues/2383 in mind. But the reason for the crash in #4114 might come from the fact that some threads are starved (22 threads and only single core in my emulator) so a thread might hold an old version of database and the device runs out of physical memory (bus error indicates that).
Do you see the crash if you disable encryption?
By the way, the test app - does it require API 23 to fail? How many iterations do it typically take to crash?
@ironage You might wish to take a look at the test app while debugging https://github.com/realm/realm-core/issues/2383.
@kneth I can reliably reproduce these crashes on 4 and 8 core devices running as few as 3 or 4 threads (see example cases https://github.com/bios-seiji/realm-crash/blob/master/app/src/main/java/io/binarysolutions/realmtest/MainActivity.java#L29). The devices report ample physical memory available (dumpsys meminfo), and running the app with largeHeap enabled does not seem to help. I also dumped the heap an looked at it in MAP and did not see any obvious leaks. What are you using to monitor when "device runs out of physical memory"? This was my initial thought also, but I have not seen it.
In our production app, I disabled encryption and was able to reproduce the issue with much more effort. I have not done extensive testing however, since encryption is necessary for us.
I recompiled to Android 5.1 (API 22) and was able to reproduce the issues. I did not get immediate crash described in my test cases, but it crashed between 1-10k write iterations reliably.
@bios-seiji The investigations have so far lead to https://github.com/realm/realm-core/pull/2426. We need to test more.
@kneth thats great. Do you have a testing build that I can run with our production tests to see if I can still trigger corruption?
@bios-seiji Currently I am getting ready to test with your test app (no reason to involve you before I have done that).
I am currently running your app in a x86 emulator. So far, thread 1 is at 34k, thread 2 is at 13k and duration at 1102k.
@bios-seiji If you wish to try my custom build, please send an email to [email protected].
@bios-seiji Your detailed bug report made it possible for us to reproduce a bug in Realm Core. It has now been fixed (see https://github.com/realm/realm-core/pull/2465), and we will release a fixed very soon. I am closing the issue, and we are very thankful for your original report.
Most helpful comment
@bios-seiji Your detailed bug report made it possible for us to reproduce a bug in Realm Core. It has now been fixed (see https://github.com/realm/realm-core/pull/2465), and we will release a fixed very soon. I am closing the issue, and we are very thankful for your original report.