Caused by: com.orientechnologies.orient.core.exception.OPageIsBrokenException: Following files and pages are detected to be broken ['ner_16.pcl' :138;], storage is switched to 'read only' mode. Any modification operations are prohibited. Typically it means hardware error, before filling a bug please check your hardware. To restore database and make it fully operational you may export and import database to and from JSON.
DB name="db_skynet"
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.checkLowDiskSpaceRequestsAndReadOnlyConditions(OAbstractPaginatedStorage.java:5130)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.createRecord(OAbstractPaginatedStorage.java:1342)
at com.orientechnologies.orient.server.distributed.impl.ODistributedStorage$5.call(ODistributedStorage.java:644)
at com.orientechnologies.orient.server.distributed.impl.ODistributedStorage$5.call(ODistributedStorage.java:638)
at com.orientechnologies.orient.server.distributed.impl.ODistributedStorage$11.call(ODistributedStorage.java:1220)
at com.orientechnologies.orient.core.db.OScenarioThreadLocal.executeAsDistributed(OScenarioThreadLocal.java:70)
at com.orientechnologies.orient.server.distributed.impl.ODistributedStorage.executeRecordOperationInLock(ODistributedStorage.java:1217)
at com.orientechnologies.orient.server.distributed.impl.ODistributedStorage.createRecord(ODistributedStorage.java:637)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.executeSaveRecord(ODatabaseDocumentTx.java:2216)
at com.orientechnologies.orient.core.tx.OTransactionNoTx.saveNew(OTransactionNoTx.java:241)
at com.orientechnologies.orient.core.tx.OTransactionNoTx.saveRecord(OTransactionNoTx.java:171)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.save(ODatabaseDocumentTx.java:2782)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.save(ODatabaseDocumentTx.java:2662)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.save(ODatabaseDocumentTx.java:103)
at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.createRecord(ONetworkProtocolBinary.java:2844)
at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.createRecord(ONetworkProtocolBinary.java:1832)
at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.executeRequest(ONetworkProtocolBinary.java:629)
at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.sessionRequest(ONetworkProtocolBinary.java:398)
at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.execute(ONetworkProtocolBinary.java:217)
at com.orientechnologies.common.thread.OSoftThread.run(OSoftThread.java:82)
2018-01-19_17:22:44 [Executor task launch worker-1] WARN orientdb.graph.dao.impl.GraphDaoImpl:215: renew and try again! tryCount: 1
2018-01-19_17:22:44 [Executor task launch worker-1] ERROR orientechnologies.orient.client.binary.OChannelBinaryAsynchClient:143: Error during exception deserialization
java.lang.NoSuchMethodException: com.orientechnologies.orient.core.exception.OPageIsBrokenException.<init>(com.orientechnologies.orient.core.exception.OPageIsBrokenException)
why should it happen and how to restore it .
Hi @xavier66 it may happen because you have a mix of different classes in our classpath, you see you have an exception which typically indicates fact which I stated above.
Method to restore is is stated in exception "To restore database and make it fully operational you may export and import database to and from JSON"
BTW could you send me the content of your logs so I will check it?
log.zip
@laa hi , log.zip is log file of one node .
I also encountered the same problem in version 2.2.30 @laa
@laa it is reproducible on 2.2.32 version too. I did not found the root cause, but it happens every few days. Export and Import solution is not viable for a production environment. Is there any progress on this issue?
Same thing happened for me on 2.2.32. Tried import/export and doesn't work as expected. Cannot rely on it in a prod environment.
Caused by: com.orientechnologies.orient.core.exception.OPageIsBrokenException: Following files and pages are detected to be broken ['entity_8.pcl' :13;], storage is switched to 'read only' mode. Any modification operations are prohibited. To restore database and make it fully operational you may export and import database to and from JSON.
DB name="my-database"
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.checkLowDiskSpaceRequestsAndReadOnlyConditions(OAbstractPaginatedStorage.java:5128)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commit(OAbstractPaginatedStorage.java:1728)
at com.orientechnologies.orient.core.tx.OTransactionOptimistic.doCommit(OTransactionOptimistic.java:541)
at com.orientechnologies.orient.core.tx.OTransactionOptimistic.commit(OTransactionOptimistic.java:99)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.commit(ODatabaseDocumentTx.java:2922)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.commit(ODatabaseDocumentTx.java:2884)
at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.commit(ONetworkProtocolBinary.java:1434)
at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.executeRequest(ONetworkProtocolBinary.java:668)
at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.sessionRequest(ONetworkProtocolBinary.java:398)
at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.execute(ONetworkProtocolBinary.java:217)
at com.orientechnologies.common.thread.OSoftThread.run(OSoftThread.java:82)
The reason for this issue is that server is crashed and partially restored or an old version of DB is used and as result page was broken. We will port part of our durability fixes applied in 3.0 to 2.2.x to prevent this issue. But if data on a page is broken, none of the software updates will fix that. For commercial support, I would propose you to send the database to us so we will fix it in one day. But for community support, the only mean now is to perform export/import of the database. The database in read-only mode also can be exported to the JSON, it does not prevent such export. Anyway let me think, maybe I will find a way to isolate page in the cluster so you will not access it and that will prevent the issue in production. I will update you today with a solution which I come.
@laa The setup is very simple, it is a single server instance where we are running multiple dbs. The serve is not crashing (probably just some threads), as the other dbs are still usable but one of them is going in read mode every few days. Today is db1 in read mode, tomorrow db2, other day db3 ...
Hi guys, I have created a tool which isolates broken pages inside of the cluster. I am testing it now. Will provide it tomorrow. @devsprint I will back to you a bit later to understand the reason for DB failures.
@laa Are there any potential date for releasing version 3.x?
Any updates on the tool or on a potential fix for the signaled issue?
@devsprint it is already released. @acuciureanu sorry for delay will try to finish it today.
@laa thanks for let it me know. It seems is just 2 days old, but I did no managed to find the binaries to download.
Hi, it will be released in the new version of 2.2.x soon. But here is build which already contains this tool.
https://drive.google.com/file/d/1cuZL4qZ6OKHLSgelryOWI8wiqjnsT7AA/view?usp=sharing
You need to do the following:
repair database --fix-cluster
.repair database --fix-links
Please send feedback how does it work.
@devsprint if you will experience any issues with broken pages after that, would be cool if you send me the log of the server since the execution of this repair.
@laa Thanks a lot! I'll try this out. Cheers.
Guys any updates on this issue, did you try the repair tool?
@laa Thanks for getting back to us. I was unable to reproduce the reported problem in order to use the repair tool. I'll let you know once I use it.
Guys any updates on this?
Any updates? we're seeing this issue repeatedly in 2.2.31
Hi @benoror did you try the tool which I provided?
@laa I managed to use the repair instructions with the build that you provided. However, it looks that for me it did not repair anything.
orientdb {db = "test-db"} > repair database --fix-cluster
Repairing database...
Repair database complete (0 errors)
orientdb {db = "test-db"} > repair database --fix-links
Repairing database...
- Removing broken links...
-- Done! Fixed links: 0, modified documents: 0
Repair database complete (0 errors)
Hi @benoror did you try the tool which I provided?
@laa We imported a backup of our production db (17hrs+ for ~5GB ๐ฑ) to test the tool but resulted in a similar result that of @acuciureanu:
(repair tool took ~15min)
Any alternative we should consider? ๐ Cheers!
@laa I experience the same issues,and could not start db with the tool build that you provided.
I've run into this issue with both 2.2.35 and 3.0.6. Is there anything we can do to prevent this instead of just having to repeatedly fix the database?
We are facing the same issue after upgrading from orient 2.1.19 to 2.2.37. Not sure what's the best way forward since the tool mentioned above did not work for anyone. How did you guys end up fixing this issue ?? if ever.
We ended up migrating to MongoDB ๐
Yup, a long night trying to resolve this with no luck. @benoror I reckon we will end up on a similar path to yours ๐ฏ
I upgraded the specs of the VM we were running it on, gave it another CPU.
Can't say for sure if it's fixed but I haven't had the issue yet since then
@creisle I tried upgrading the specs too, but it didn't work for us...
Hi guys
This durability issue is fixed in 3.0.13 but to get advantage from this fix
you need to either create database from scratch and use it or export data
to json and then import them back. Please let me know if that fix given
issue unfortunately to fix this issue on 100% we need to rewrite
architecture under neath so I do suggest you to migrate to the 3.0.13
version.
On Sat, Jan 5, 2019, 09:20 Alexandru Cuciureanu <[email protected]
wrote:
@creisle https://github.com/creisle I tried upgrading the specs too,
but it didn't work for us...โ
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/orientechnologies/orientdb/issues/8012#issuecomment-451634402,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAGaagDse1GhrSAyAUnHFCqrGwp1pjazks5vAFIqgaJpZM4RkMlL
.
@laa We ended up rolling back to the older version of orient 2.1.19, for some reason when orient 2.2.37 starts, one of the pcl
files get corrupted and we enter the situation mentioned above.
Funny enough the same backup is used by 2.1.19 and it works fine.
@laa we also found this get logged repeatedly after the upgrade:
Exception `2D2C834E` in storage `plocal:/var/lib/orientdb/ProdDB`: 2.2.37 (build a7541e7ceeabf592dd9a7b2928b6c023cbc73193, branch 2.2.x)
com.orientechnologies.orient.core.sql.OCommandSQLParsingException: Error parsing query:
select from null
^
Encountered " <FROM> "from "" at line 1, column 8.
Was expecting one of:
<NULL> ...
"true" ...
"false" ...
"{" ...
<NULL> ...
"{" ...
"true" ...
"false" ...
DB name="ProdDB"
at com.orientechnologies.orient.core.sql.parser.OStatementCache.throwParsingException(OStatementCache.java:146)
at com.orientechnologies.orient.core.sql.parser.OStatementCache.parse(OStatementCache.java:138)
at com.orientechnologies.orient.core.sql.parser.OStatementCache.get(OStatementCache.java:88)
at com.orientechnologies.orient.core.sql.parser.OStatementCache.get(OStatementCache.java:70)
at com.orientechnologies.orient.core.sql.OCommandExecutorSQLAbstract.preParse(OCommandExecutorSQLAbstract.java:235)
at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.parse(OCommandExecutorSQLSelect.java:259)
at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.parse(OCommandExecutorSQLSelect.java:93)
at com.orientechnologies.orient.core.sql.OCommandExecutorSQLDelegate.parse(OCommandExecutorSQLDelegate.java:53)
at com.orientechnologies.orient.core.sql.OCommandExecutorSQLDelegate.parse(OCommandExecutorSQLDelegate.java:34)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.command(OAbstractPaginatedStorage.java:3302)
at com.orientechnologies.orient.core.sql.query.OSQLQuery.run(OSQLQuery.java:78)
at com.orientechnologies.orient.core.sql.query.OSQLAsynchQuery.run(OSQLAsynchQuery.java:74)
at com.orientechnologies.orient.core.sql.query.OSQLSynchQuery.run(OSQLSynchQuery.java:102)
at com.orientechnologies.orient.core.query.OQueryAbstract.execute(OQueryAbstract.java:33)
at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.command(ONetworkProtocolBinary.java:1567)
at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.executeRequest(ONetworkProtocolBinary.java:665)
at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.sessionRequest(ONetworkProtocolBinary.java:399)
at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.execute(ONetworkProtocolBinary.java:218)
at com.orientechnologies.common.thread.OSoftThread.run(OSoftThread.java:82)
Exception `5A3A1A00` in storage `plocal:/var/lib/orientdb/ProdDB`: 2.2.37 (build a7541e7ceeabf592dd9a7b2928b6c023cbc73193, branch 2.2.x)
Does this ring any bells ?
Hi guys This durability issue is fixed in 3.0.13 but to get advantage from this fix you need to either create database from scratch and use it or export data to json and then import them back. Please let me know if that fix given issue unfortunately to fix this issue on 100% we need to rewrite architecture under neath so I do suggest you to migrate to the 3.0.13 version.
Hi @laa, thank you very much for looking into this serious issue.
My company is also running into it, and we were wondering what the cause of the problem is but couldn't find anything in the changelog for version 3.0.13 or any linked PR for the fix. We are preparing to migrate to OrientDB 3.0, but as our production cluster is going to be on 2.2.37 in the meantime, we'd like to try to mitigate and minimize the impact of the problem as much as possible.
After all, it's really difficult for us to focus on migrating to OrientDB 3.0 when our existing database is on fire on a regular basis. Any help or suggestions you are able to provide would be very much appreciated.
We are also experiencing this problem (using OrientDB 2.2.37).
Is this something that happens only after a server crashes or may it happen as well spontaneously?
This is critical for us.
We've definitely had it happen spontaneously, whenever the database was under sustained heavy write load. As a temporary hack, we've rate-limited writes to the database, which has made the problem seemingly go away. My guess is that the bug is due to an overflowing of some write-side buffer, corrupting the database that way.
Should be fixed in latest 3.0.x
That's great news, thank you @laa! My team spent a lot of time hunting this bug and was unable to come up with a small reproducer test case. I am very curious to know what the underlying issue was, if there are any details you are able to share.
Still running into the issue with 3.0.21. JVM Crash causes the table to a broken state.
Most helpful comment
We ended up migrating to MongoDB ๐