Orientdb: orientDB 3.0.0RC1 has big performance drop in comtrast to 2.2.13

Created on 6 Mar 2018  路  27Comments  路  Source: orientechnologies/orientdb

OS: Windows12
Java SDK : 1.8
OrientDB version :2.2.13 and 3.0.0.RC

When I upgrade the db server from 2.2.13 to orientdb 3.0.0RC1, the result in below shows

  1. Update rate time consumption increase 20%
  2. Read and insert rate decrease 10~20 times. Which is big retrogress

image

heapsize:2.2 GB; heapmaxsize:7.1 GB; heapFreesize:1.4 GB; numCPUs:8;cpuFrequencyInHz:2400000000;physical Memory:total: 32767Mb free: 26550Mb;swap Memory:total: 37631Mb free: 8349Mb;

MAM-TEST 8 CPU/32GB RAM 450 GB HD with Win2012R2

Most helpful comment

@tglman hi, I roughly run the same test to 3.0.0RC2 on my local CentOS 7, i noticed the performance is better than on windows. how come? let me finish those tests cases, and upload the result.

All 27 comments

One Device has 60 properties; insert 1 devices = insert 63 vertexes and related Edges;
same number of actions is applied to updating

Hi, could you try on RC2 first and send us feedback.

Hi @hp1975

We need some additional information on how you are doing the operations. Is it via SQL or via API? Remote or plocal?

Having some sample code would help a lot

Thanks

Luigi

@luigidellaquila any way we need to try on RC2 first before we will proceed

I've seen a similar drop in performance during transactions with remote protocol because the entire TX, with v3.0, is flushed back to the server before to execute a query. Could this be the case? @hp1975 could you please share a code/test-case for it?

We did the test by SQL and create graphFactory by
graphFactory = new OrientGraphFactory("remote:orientdb-primary:3424/isc", userName, userPassword).setupPool(initCapacity, poolsize * 2);

and, you can find the example code of our testing in the attachment.
PropertyPerformanceTest.txt

My suspects are correct. With v3.0 everytime you execute a query, the open transaction is flushed to the server (to have the right consistency level).

@tglman does make any sense to introduce something like OTransaction.flushOnCommand(boolean) to disable this behavior from the user? Or, even better, playing with the consistency level already in OTransaction:

  enum ISOLATION_LEVEL {
    READ_COMMITTED, REPEATABLE_READ
  }

?

@lvca any solution to my case? and what did i make mistake in my usage ?

@laa wait a moment for the another around to test this under RC2

3.0.0RC2 even worse
image 015

@hp1975 @jia57196 is it possible compare embedded to embedded on your server? So we will check which components have a problem. Also please use not 2.2.13 but latest 2.2.x. 2.2.13 has a bug when data only partially flushed so using 2.2.13 you actually risk to lose your data.

@hp1975 @jia57196 you could also provide us a full benchmark, but if that is not possible, I bound to ask you to perform several different runs.

Hi @hp1975 @jia57196 this issue is really important for us but without additional data, we can not help :-(

I provide the test result, source code to make such testing. what extra additional data you want?
what you can do is to help youself and your team.

we actually decide to move on other kind database if such bad performance cannot be solved.

@hp1975 the problem seems to rely on using SQL + TRANSACTIONS + REMOTE protocol. This combination could result in worse performance because now OrientDB ha a higher level of consistency and demands the client to flush the transaction to the server for each command.

The solution's easy, for me, but I'm waiting for @tglman for validating it. If the user chooses a lower consistency level, the TX is not flushed. That's it. And you should see better performance than 2.2.x.

Hi all,

By academia there is no isolation level that include not read your own write (and it make no sense anyway), what OrientDB was doing in 2.x remote transaction was only batching changes and pushing them all together to the server, while the same API in embedded worked as a proper transaction, now embedded and remote have the same behavior, what we can do is introduce an API for batch write operations that do not interact with the server and can have the same behavior int embedded, this can be done simply introducing two new methods beginBatch() and flushBatch() that can be used as "replacement" of begin() and commit().

As temporary work around we can provide a setting that disable the transactions and make them work as 2.x so migration will be smoother.

Regards

I like both ideas. Can you do the 2nd quickly, so @hp1975 can measure if that was the problem?

Hi @hp1975,

After double checking your code I saw that the transaction flush is not the cause of the problem, the transaction flush happen only if you use the new query API and the new query executor, but you still using the 2.x code style, so nothing of this is triggered.

In the past we had some issue when the client and the server was of a different version, data was inserted correctly but the where some network issues that cause the client to reconnect at every operation slowing down a lot the operations, could you double check if the server and the client are using exactly the same OrientDB version ?

Regards

For each round test, we use corresponding orientdb, library.
we didnot mix the version among server and client. we are very careful in this.
all tests were done in a sole server, there is no network delay or bandwitdh problem.

I see @hp1975 I will try to reproduce your benchmark in a couple of days on windows to check the cause of the issue.

Hi @hp1975,
I got your code and re-adapted to remove your custom parts, i ran it in 3.0.0 and last 2.2.x here the results:
3.3.0:

Devices:1000;insert Time(sec/1000device):     0.497;read Time(sec/1000device):     0.005;update Time(sec/1000device):     0.507
Devices:10000;insert Time(sec/1000device):     0.234;read Time(sec/1000device):     0.002;update Time(sec/1000device):     0.376
Devices:50000;insert Time(sec/1000device):     0.203;read Time(sec/1000device):     0.002;update Time(sec/1000device):     0.304
Devices:100000;insert Time(sec/1000device):     0.197;read Time(sec/1000device):     0.002;update Time(sec/1000device):     0.312
Devices:500000;insert Time(sec/1000device):     0.202;read Time(sec/1000device):     0.002;update Time(sec/1000device):     0.324
Devices:1000000;insert Time(sec/1000device):     0.199;read Time(sec/1000device):     0.002;update Time(sec/1000device):     0.341

2.2.x:

Devices:1000;insert Time(sec/1000device):     0.381;read Time(sec/1000device):     0.007;update Time(sec/1000device):     1.095
Devices:10000;insert Time(sec/1000device):     0.226;read Time(sec/1000device):     0.002;update Time(sec/1000device):     0.633
Devices:50000;insert Time(sec/1000device):     0.189;read Time(sec/1000device):     0.001;update Time(sec/1000device):     0.565
Devices:100000;insert Time(sec/1000device):     0.193;read Time(sec/1000device):     0.002;update Time(sec/1000device):     0.556
Devices:500000;insert Time(sec/1000device):     0.199;read Time(sec/1000device):     0.002;update Time(sec/1000device):     0.464
Devices:1000000;insert Time(sec/1000device):     0.204;read Time(sec/1000device):     0.002;update Time(sec/1000device):     0.554

I do not see huge differences between the two versions, anyway my environment is Linux (this should not make any difference) and I run the test in localhost (this may make a difference), did you run your test in localhost?

Would be cool if we can go through with you in the environment setup, do you have a support contract ?
I also noticed some miss usage of the orient API here and there that is better to correct as well in you existing application.

Regards

As I said in begining of this testing, one device has more than 50 properties which linked with EDGE. insertion will do insert device and plus those 50 properties,
I agree, if you just insert one device, the difference looks minor. but, be attention to those properties also to be inserted and udpated as well.

I run those testing in local machine, windows MAM-TEST 8 CPU/32GB RAM 450 GB HD with Win2012R2,
client and server were in same machine.

@tglman you can reach me haipeng.[email protected] on skype.

@tglman please add me to the chat too.

@tglman hi, I roughly run the same test to 3.0.0RC2 on my local CentOS 7, i noticed the performance is better than on windows. how come? let me finish those tests cases, and upload the result.

@hp1975 could you provide a log of the server start on Windows server?

Hi @hp1975,

I double check the code again ,and actually it was not correct for a couple of reasons, first the command did not have the execute so in a few case the query was not running at all, as well in 2.2.x the SQL in remote do not support transactions so the begin/commit has no effect(in 2.2.x).

Regards

Was this page helpful?
0 / 5 - 0 ratings

Related issues

MartinBrugnara picture MartinBrugnara  路  35Comments

ssenapat picture ssenapat  路  24Comments

dcardin picture dcardin  路  28Comments

janjilek picture janjilek  路  33Comments

Bandes picture Bandes  路  34Comments