Realm-java: OutOfMemoryError: std::bad_alloc in io_realm_internal_TableQuery.cpp line 1285

Created on 21 Nov 2016 · 47Comments · Source: realm/realm-java

Goal

Our app logs metrics from a wearable (sent via BTLE). Metrics are sent and saved to Realm every second. The Realm is observed to display the metrics back to the user. Other metrics are periodically sent to a backend (and then removed from the Realm).

The app operates for about 5 hours before getting closed due to an OoM. The Realms all seem to be closed correctly (no warnings about about open Realms).

Expected Results

I'm expecting uninterrupted periods of use.

Actual Results

Logcat reveals the following at the point that the app is closed:

11-21 05:15:33.808 27702-28621/com.snuza.picoconnect E/REALM_JNI: jni: ThrowingException 4, std::bad_alloc in /Users/cm/Realm/realm-java/realm/realm-library/src/main/cpp/io_realm_internal_TableQuery.cpp line 1285, .
11-21 05:15:33.809 27702-28655/com.snuza.picoconnect E/REALM_JNI: jni: ThrowingException 4, std::bad_alloc in /Users/cm/Realm/realm-java/realm/realm-library/src/main/cpp/io_realm_internal_TableQuery.cpp line 1285, .
11-21 05:15:33.841 27702-28621/com.snuza.picoconnect E/REALM_JNI: Exception has been throw: std::bad_alloc in /Users/cm/Realm/realm-java/realm/realm-library/src/main/cpp/io_realm_internal_TableQuery.cpp line 1285
11-21 05:15:33.841 27702-28655/com.snuza.picoconnect E/REALM_JNI: Exception has been throw: std::bad_alloc in /Users/cm/Realm/realm-java/realm/realm-library/src/main/cpp/io_realm_internal_TableQuery.cpp line 1285
11-21 05:15:33.855 27702-28621/com.snuza.picoconnect E/REALM_JAVA: io.realm.internal.OutOfMemoryError: std::bad_alloc in /Users/cm/Realm/realm-java/realm/realm-library/src/main/cpp/io_realm_internal_TableQuery.cpp line 1285
at io.realm.internal.TableQuery.nativeBatchUpdateQueries(Native Method)
at io.realm.internal.TableQuery.batchUpdateQueries(TableQuery.java:500)
at io.realm.internal.async.QueryUpdateTask.run(QueryUpdateTask.java:87)
at io.realm.internal.async.BgPriorityRunnable.run(BgPriorityRunnable.java:34)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:422)
at java.util.concurrent.FutureTask.run(FutureTask.java:237)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587)
at java.lang.Thread.run(Thread.java:818)

Steps & Code to Reproduce

Approximately 5 hours of runtime.

Code Sample

N/A

Version of Realm and tooling

Realm version(s): 2.2.1

Realm sync feature enabled: no

Android Studio version: 2.3 Canary

Which Android version and device: Android 5.1.1 on the Nexus 5

T-Bug-Crash

Source

mitchwongho

Most helpful comment

2.3.0-SNAPSHOT is applied to my app. Initial observations (after 1 hour) is that there is an big improvement in memory-management (kudos 👏🏽). I'll report back after our 12-hour benchmark and then at 24-hours.

mitchwongho on 28 Nov 2016

🎉2

All 47 comments

Is it possible you have any RealmObject/RealmList/RealmResults leaked somewhere? If those are not GCed, the relevant native memory won't be freed.

beeender on 21 Nov 2016

I will comb through the code for that. As mentioned, I'm using the Observable-pattern (via asObservable) so would only have visibility to them at onNext emissions.

mitchwongho on 21 Nov 2016

There isn't a case where "live" RealmObjects are emitted, but rather clones of result RealmObjects.

mitchwongho on 21 Nov 2016

Well, how do you create your copy? I expect that you have unclosed Realm instances on the io() scheduler.

Zhuinden on 21 Nov 2016

Clones are created by way of a _shallow copy_.

In our implementation, the Realm is never closed (while on the Activity) because we're observing it for RealmObjects being added (to display metric values on the UI) from inserts on the BTLE end.

Is this the incorrect pattern to apply for observable RealmResults?

mitchwongho on 21 Nov 2016

@mitchwongho Did you call unsubscribe() on the returned Subscription?

beeender on 21 Nov 2016

When the Activity (or view is deattached) the RealmResult observable is unsubscribed and the Realm is closed (via doOnUnsubscribe(). doOnComplete and doOnError are close the Realm). This it not an issue.

However, we're expecting the screen (Activity/View) to be in the foreground for at least 24 hours and at most 5 days with RealmObject's being inserted at 1/sec. So, the Realm should be open for that duration of operation.

Is this possible with Realm?

mitchwongho on 21 Nov 2016

Can you share some relevant source code?

beeender on 21 Nov 2016

Inserting:

try (final Realm realm = Realm.getDefaultInstance()) {
            realm.executeTransaction(realm1 -> {
                        // Value not saved yet
                        if (statData.getInternalId() == null) {
                            statData.setInternalId(RealmPrimaryKey.newInstance().toString());
                        }

                        // Clone and insert
                        final PicoStatDataRealm picoStatDataRealm = PicoStatDataRealm.map(statData);
                        realm1.copyToRealm(picoStatDataRealm);
                    });

Observing:

final Realm realm = Realm.getDefaultInstance();

        return realm.where(PicoStatDataRealm.class)
                .equalTo("picoSerialNr", picoSerialNr)
                .greaterThan("timeMillis", DateTime.now().getMillis())
                .findAllSortedAsync("timeMillis", Sort.DESCENDING) //ensures most recent is first;
                .asObservable()
                .doOnError(throwable -> realm.close())
                .doOnCompleted(realm::close)
                .doOnUnsubscribe(realm::close)
                .filter(picoStatDataRealms -> !picoStatDataRealms.isEmpty())
                .map(picoStatDataRealms -> picoStatDataRealms.first().getClone());

mitchwongho on 21 Nov 2016

Considering you use try with resources for insertion, that should definitely not be a problem, because that is auto-close.

While I am surprised there is no RealmResults::isLoaded filter on the async results, this in itself doesn't seem to have any problems...

Zhuinden on 21 Nov 2016

Good tip on RealmResults::isLoaded

mitchwongho on 21 Nov 2016

Maybe https://realm.io/docs/java/latest/#rxjava is worth a read.

kneth on 21 Nov 2016

Thanks. I've add filter(RealmResults::isLoaded) and will wait to see if there is an improvement.

mitchwongho on 21 Nov 2016

I've just start noticing the following before crashing with the OoM exception:

E/REALM_JNI: jni: ThrowingException 7, std::exception in /Users/cm/Realm/realm-java/realm/realm-library/src/main/cpp/io_realm_internal_TableQuery.cpp line 1285, .

btw added filter(RealmResults::isLoaded) doesn't seem to alleviate the crash.

mitchwongho on 21 Nov 2016

@mitchwongho That's an interesting observation. The exception is thrown when you do an async query, and exception 7 is a version mismatch. Version refers here to the version, each thread/instance sees (Realm is a Multi-Version Concurrency Control database). What kind of thread do you do the write transaction on?

kneth on 23 Nov 2016

Writes are on the i/o thread: BluetoothGattCallback.onCharacteristicChanged() callback.

mitchwongho on 23 Nov 2016

Is auto-refresh enabled on your Realm instances?

kneth on 24 Nov 2016

@kneth well, Schedulers.io() doesn't have a looper to auto-update with

Zhuinden on 24 Nov 2016

auto-refresh is enabled on AndroidSchedulers.mainThread. Correct me if I'm wrong, auto-refresh is only applicable to the _queries_ and not on _insert_ or _update_ calls, right?

mitchwongho on 24 Nov 2016

Auto-refresh is applied to Realm instances and their retained versions; and the RealmResults to be up-to-date with their Realm instances.

Zhuinden on 24 Nov 2016

I created a test app last night to try isolate this aspect of our project Gist

It's been running for 13hours now (94k RealmObjects). It's interesting to see that a simple app like this hits 32MB of RAM.

mitchwongho on 24 Nov 2016

After 13h20m:

11-24 11:50:20.564 22867-22867/? D/AndroidRuntime: >>>>>> START com.android.internal.os.RuntimeInit uid 2000 <<<<<<
11-24 11:50:20.569 22867-22867/? D/AndroidRuntime: CheckJNI is OFF
11-24 11:50:20.695 22867-22867/? D/AndroidRuntime: Calling main entry com.android.commands.am.Am
11-24 11:50:20.701 832-1688/? I/ActivityManager: Force stopping com.mitchwongho.test.observablerealms appid=10164 user=0: from pid 22867
11-24 11:50:20.701 832-1688/? I/ActivityManager: Killing 2262:com.mitchwongho.test.observablerealms/u0a164 (adj 0): stop com.mitchwongho.test.observablerealms
11-24 11:50:20.728 832-2677/? I/WindowState: WIN DEATH: Window{3fd739cb u0 com.mitchwongho.test.observablerealms/com.mitchwongho.test.observablerealms.MainActivity}
11-24 11:50:20.759 832-1688/? W/ActivityManager: Force removing ActivityRecord{28fff4d5 u0 com.mitchwongho.test.observablerealms/.MainActivity t17660}: app died, no saved state
11-24 11:50:20.760 832-895/? I/UsageStatsService: User[0] Rolling over usage stats
11-24 11:50:20.760 832-895/? I/UsageStatsService: User[0] Flushing usage stats to disk
11-24 11:50:20.770 832-1249/? W/ActivityManager: Spurious death for ProcessRecord{360ebd91 2262:com.mitchwongho.test.observablerealms/u0a164}, curProc for 2262: null
11-24 11:50:20.777 22867-22867/? D/AndroidRuntime: Shutting down VM

mitchwongho on 24 Nov 2016

@mitchwongho I think you need to call unsubscribe() on every subscription in the subs before line 89 in order to let Realm know the Person object you subscribed to is not needed anymore, then Realm knows the relevant resources can be freed. If you run the code of onResume/onPause in a loop, I guess you can see the leak easier.

beeender on 24 Nov 2016

@beeender I was wondering, doesn't compositeSubscription.clear() already do that?

Based on the docs, I would expect it to do that.

Zhuinden on 24 Nov 2016

@Zhuinden hmmm, let me test.

beeender on 24 Nov 2016

If I remember correctly, you need to call compositeSubscription.unsubscribe() first. clear() just remove any tracked subscriptions active or not.

cmelchior on 24 Nov 2016

@Zhuinden @cmelchior I just tested, the clear() does the unsubsribe. (rxJava 1.1.0)

@mitchwongho I created a project to simulate your problem, but I cannot see a leak. Would you please try with it? If you want to modify anything to reproduce your problem, just fork and change it, so i can see and get the source to debug. The project is here. https://github.com/beeender/RealmRxTest
Also Can you share your implementation of Person.getClone()?

beeender on 24 Nov 2016

👍1

It's just a simple shallow copy Gist

public final Person getClone() {
        final Person clone = new Person();
        clone.id = this.id;
        clone.name = this.name;
        clone.aBoolean = this.aBoolean;
        clone.aByte = this.aByte;
        clone.aShort = this.aShort;
        clone.aInt = this.aInt;
        clone.aFloat = this.aFloat;
        clone.aDouble = this.aDouble;
        clone.timestamp = this.timestamp;
        return clone;
    }

mitchwongho on 24 Nov 2016

Thanks @beeender

mitchwongho on 24 Nov 2016

@mitchwongho You can use Realm.copyFromRealm() to do the shallow copy. But it doesn't seem to be the problem.

beeender on 24 Nov 2016

👍1

@beeender Thank you test also seem to yield the same behaviour as my test app.

I've managed to take a _heapdump_ just before the OoM. Looks like there are a lot of FinalizerReferences and Long
screen shot 2016-11-26 at 7 04 30 am

screen shot 2016-11-26 at 6 58 09 am

mitchwongho on 26 Nov 2016

@mitchwongho That is interesting! You screenshot shows the objects waiting to be freed in the finalizer actually takes a lot of memory. But luckily we are not using the finalizer anymore. See https://github.com/realm/realm-java/pull/3144 . That has not been released yet, but if you want, you can try with our snapshot release.

How long does it take to get OOM with my project? On what kind of device? I can try with it as well.

beeender on 28 Nov 2016

@beeender The screenshots above if for my app (crashes after 3-4 hours but implements the same pattern for different Realm modals). I only ran your test app for a couple of hours without reaching an OoM. Here is a heapdump of your test app after an hour:

screen shot 2016-11-28 at 8 49 49 am

mitchwongho on 28 Nov 2016

@mitchwongho OK ... Those objects on the heap is actually expected. They are waiting for the native resources get dealloced. However, since we used finalizer thread to do deallocation before, it might trigger an OOM if the finalizer doesn't get time to run (there are locks ... )

Would you please try with our snapshot release with your application to see if the problem still happens. I think #3144 should fix your problem.

beeender on 28 Nov 2016

👍2

mitchwongho on 28 Nov 2016

🎉2

After 8hours I started getting these:

11-28 23:24:48.819 25334-26300/com.snuza.picoconnect E/REALM_JNI: jni: ThrowingException 7, std::exception in /tmp/realm-java/realm/realm-library/src/main/cpp/io_realm_internal_TableQuery.cpp line 1282, .
11-28 23:24:48.819 25334-26300/com.snuza.picoconnect E/REALM_JNI: Exception has been throw: std::exception in /tmp/realm-java/realm/realm-library/src/main/cpp/io_realm_internal_TableQuery.cpp line 1282

Memory usage is looking tamed.

mitchwongho on 29 Nov 2016

It looks like then it becomes the same issue with #3802 :(
Is it possible that you can share your project with us? Or if you can make https://github.com/beeender/RealmRxTest close to your existing logic, that will be great. I will try to reproduce it in my side. Thanks a lot!

beeender on 29 Nov 2016

@beeender The app has been running now for 14 hours. It doesn't appear that the Exception 7 caused any broader issue. It looks like it occurs when a certain event happens in the SyncAdapter (perhaps when attempting to delete _RealmObjects_). I will continue to monitor and report back when I can reproduce.

mitchwongho on 29 Nov 2016

👍1

@beeender The Exception 7 issue may be caused by an asynchronous _query-delete_ bug on my side. I will continue to monitor. As an update, the app has successfully passed the 24-hour operation milestone with crashing and the memory usage appears to be under control.

mitchwongho on 29 Nov 2016

@beeender I do seem to still experience the Exception 7 issue. Could you tell me if those JNI exceptions get propagated to the Java level? I don't seem to be catching any exceptions/throwable?

mitchwongho on 30 Nov 2016

Exception 7 is originally thrown by our native code and then in JNI we wrap it as a java exception and throw it. Do you have a full log of the exception?

Can you take some log for us? enable Realm native logs by call RealmLog.setLevel(LogLevel.ALL); after Realm.init(), then get the logcat log by -v thread -v time.

beeender on 30 Nov 2016

👍1

I'm trying to pinpoint where the exception occurs. My log looks like this:

11-30 12:01:56.349 5198-5225/com.snuza.picoconnect D/BluetoothGatt: close()
11-30 12:01:56.349 5198-5225/com.snuza.picoconnect D/BluetoothGatt: unregisterApp() - mClientIf=6
11-30 12:02:01.207 5198-5224/com.snuza.picoconnect D/BluetoothGatt: onClientConnectionState() - status=0 clientIf=5 device=44:D6:E1:00:02:65
11-30 12:02:01.208 5198-5224/com.snuza.picoconnect D/Log: C64100342D GATT - STATE_DISCONNECTED
11-30 12:02:01.208 5198-5224/com.snuza.picoconnect D/BluetoothGatt: close()
11-30 12:02:01.208 5198-5224/com.snuza.picoconnect D/BluetoothGatt: unregisterApp() - mClientIf=5
11-30 12:02:05.268 5198-5374/com.snuza.picoconnect E/REALM_JNI: jni: ThrowingException 7, std::exception in /tmp/realm-java/realm/realm-library/src/main/cpp/io_realm_internal_TableQuery.cpp line 1282, .
11-30 12:02:05.268 5198-5374/com.snuza.picoconnect E/REALM_JNI: Exception has been throw: std::exception in /tmp/realm-java/realm/realm-library/src/main/cpp/io_realm_internal_TableQuery.cpp line 1282
11-30 12:02:06.184 5198-5390/com.snuza.picoconnect E/REALM_JNI: jni: ThrowingException 7, std::exception in /tmp/realm-java/realm/realm-library/src/main/cpp/io_realm_internal_TableQuery.cpp line 1282, .
11-30 12:02:06.184 5198-5390/com.snuza.picoconnect E/REALM_JNI: Exception has been throw: std::exception in /tmp/realm-java/realm/realm-library/src/main/cpp/io_realm_internal_TableQuery.cpp line 1282
11-30 12:02:49.898 5198-6634/com.snuza.picoconnect D/Log: setAppServerConnectionState ::set 1
11-30 12:02:49.898 5198-6634/com.snuza.picoconnect D/Log: setAppServerConnectionState() Pico OFFLINE
11-30 12:02:49.898 5198-6634/com.snuza.picoconnect W/Log: PicoControlService::createRealtimeTxWebSocket {deviceId=58170e031800009d3611c691,threadId=17417} CLOSED PICO CONTROL
11-30 12:20:31.084 5198-5198/com.snuza.picoconnect D/SyncService: onCreate()
11-30 12:20:31.097 5198-2066/com.snuza.picoconnect D/Log: SyncAdapter::onPerformSync - START

mitchwongho on 30 Nov 2016

With Realm logging set to ALL:

11-30 12:26:55.992 9791-9791/com.snuza.picoconnect V/REALM_JNI:  --> Java_io_realm_internal_SharedRealm_nativeIsClosed -1571913520
11-30 12:26:55.992 9791-9791/com.snuza.picoconnect V/REALM_JNI:  --> Java_io_realm_internal_SharedRealm_nativeIsClosed -1571913520
11-30 12:26:55.992 9791-10049/com.snuza.picoconnect V/REALM_JNI:  --> Java_io_realm_internal_SharedRealm_nativeCloseSharedRealm -1362919088
11-30 12:26:55.992 9791-11372/com.snuza.picoconnect V/REALM_JNI:  --> Java_io_realm_internal_SharedRealm_nativeIsClosed -1571910936
11-30 12:26:55.992 9791-11372/com.snuza.picoconnect V/REALM_JNI:  --> Java_io_realm_internal_SharedRealm_nativeCommitTransaction -1571910936
11-30 12:26:55.999 9791-10035/com.snuza.picoconnect V/REALM_JNI:  --> Java_io_realm_internal_SharedRealm_nativeCloseConfig -1360611280
11-30 12:26:55.999 9791-10035/com.snuza.picoconnect V/REALM_JNI:  --> Java_io_realm_internal_TableQuery_nativeBatchUpdateQueries
11-30 12:26:55.999 9791-10035/com.snuza.picoconnect E/REALM_JNI: jni: ThrowingException 7, std::exception in /tmp/realm-java/realm/realm-library/src/main/cpp/io_realm_internal_TableQuery.cpp line 1282, .
11-30 12:26:55.999 9791-10035/com.snuza.picoconnect E/REALM_JNI: Exception has been throw: std::exception in /tmp/realm-java/realm/realm-library/src/main/cpp/io_realm_internal_TableQuery.cpp line 1282
11-30 12:26:55.999 9791-10035/com.snuza.picoconnect D/REALM_JAVA: Query update task could not complete due to a BadVersionException. Retry is scheduled by a REALM_CHANGED event.
11-30 12:26:55.999 9791-10035/com.snuza.picoconnect V/REALM_JNI:  --> Java_io_realm_internal_SharedRealm_nativeCloseSharedRealm -1362919152
11-30 12:26:56.047 9791-11372/com.snuza.picoconnect V/REALM_JNI:  --> Java_io_realm_internal_SharedRealm_nativeGetSnapshotVersion -1571910936
11-30 12:26:56.047 9791-11372/com.snuza.picoconnect V/REALM_JNI:  --> Java_io_realm_internal_SharedRealm_nativeCloseSharedRealm -1571910936
11-30 12:26:56.047 9791-9791/com.snuza.picoconnect D/REALM_JAVA: REALM_CHANGED : io.realm.HandlerController@1c289e57
11-30 12:26:56.048 9791-9791/com.snuza.picoconnect V/REALM_JAVA: REALM_CHANGED realm: io.realm.HandlerController@1c289e57 updating async queries, total: 10
11-30 12:26:56.048 9791-9791/com.snuza.picoconnect V/REALM_JNI:  --> Java_io_realm_internal_TableQuery_nativeHandoverQuery -1263060352

mitchwongho on 30 Nov 2016

@mitchwongho Hmmm, OK, this one is thrown in our background thread to update the async queries, and it is not harmful for your app as far as i can tell (it just print some log and catch the exception in our code.). After this happens, Realm will try to update the async query later. So, do you see any abnormal behavior after this log printed?

beeender on 30 Nov 2016

And FYI, the exception is logged here https://github.com/realm/realm-java/blob/3cbc084b3c888e9555629638e31adebbfd369c8f/realm/realm-library/src/main/java/io/realm/internal/async/QueryUpdateTask.java#L121

beeender on 30 Nov 2016

I'm investigating if there is a correlation between the Realm exception on an IO timeout event on our applicable level.

I realise that this discussion is now outside the scope of the initial ticket raised, so I'm happy that this ticket be closed and look forward for the fix to be promoted to the Release build. Thank you.

mitchwongho on 30 Nov 2016

Thanks a lot for the valuable feedback!! I will just close this issue. The #3144 will be included in the next Realm release soon. Please let us know if there are more things we can help.

beeender on 30 Nov 2016

🎉1

Was this page helpful?

0 / 5 - 0 ratings