Litedb: Alarming Performance when using LiteDB for simple record saving

Created on 6 Oct 2016  路  21Comments  路  Source: mbdavid/LiteDB

I had a requirement for a certain application where a large number of messages will be received and they need to be stored. so I chose LiteDB and used it with default configuration as mentioned in the documentation. Here is the class that I used:

    public class HolderMessage
    {
        public long Id { get; set; }
        public GridMessageType MessageType { get; set; }
        public HolderMessageState State { get; set; }
        public DateTime AddedOn { get; set; }
        public byte[] Message { get; set; } 
     }

In some other class

        public Database(FileInfo file)
        {
            _db = new LiteDB.LiteDatabase(file.FullName);

            _inbox = _db.GetCollection<HolderMessage>("Inbox");
            _outbox = _db.GetCollection<HolderMessage>("Outbox");


            Action<LiteDB.LiteCollection<HolderMessage>> PrepareIndex = (arg) =>
            {
                arg.EnsureIndex(x => x.Id, new LiteDB.IndexOptions { EmptyStringToNull = true, Unique = true, TrimWhitespace = true });
                arg.EnsureIndex(x => x.AddedOn, new LiteDB.IndexOptions { EmptyStringToNull = true, Unique = false, TrimWhitespace = true, IgnoreCase = true });
                arg.EnsureIndex(x => x.MessageType, new LiteDB.IndexOptions { EmptyStringToNull = true, Unique = false, TrimWhitespace = true });
            };

            PrepareIndex(_inbox);
            PrepareIndex(_outbox);
        }

yet at some other location I'm saving the messages

        public void Push(Message msg)
        {
            var mail = new HolderMessage
            {
                Id = ++_counter,
                AddedOn = DateTime.Now,
                MessageType = msg.Kind,   // this is enum
                Message = Common.Utility.Serialize(msg),   // this returns a byte array
                State = HolderMessageState.UnProcessed  // this is an enum
            };

            _queue.Insert(mail);
        }

and when we call Push() during testing, it gave the shockingly poor performance ever

            var watch = new System.Diagnostics.Stopwatch();
            watch.Start();
            for(var i = 1; i<10000;i++)
                storage.Inbox.Push(CreaterMessage(new List<string> { $"Message:{i}" }));
            watch.Stop();

so what exactly is wrong? is the configuration a problem or something else? because as the code is, it gives 20 records per second speed. i.e _10,000/ watch.TotalSeconds_

question

Most helpful comment

@MercedeX You're saying you need a database able to perform 10000 operations per second. If a disk based DBMS seems to have more than that, it uses caching, either internally or via the OS. Since you insist that this violates your consistency model (else bulk insertion would be possible), all changes must be flushed to disk immediately. Before that, obviously, the database has to read at least its header for each operation. Never mind that it probably has to read a few more pages.

So all of the 10000 operations require at least 1 disk read and write naturally, on a regular HDD instead of an SSD. Regular hard disks physically do not support this. For instance, a 15k rpm SAS drive has only around 200 IOPS. So if you find _any_ offline DBMS that can actually do what you want, let me know.

All 21 comments

You have a bunch of indexes, so it's of course a bit slower than if you were just inserting data. But 20/s is still way too slow and not normal. Well, maybe unless you're using a very old hard drive - an SSD is preferable for databases, obviously.

I see two issues with your test. The first is that you're not using prepared data. Try to first create 10000 HolderMessages, then start the stopwatch and insert them. This is standard procedure to make sure we measure the right thing.

The second issue is that you're inserting items one-by-one. Try bulk insertion, it should be _a lot_ faster. In production, you would collect a bunch of messages (e.g. for 0.1 or 1.0 seconds) and then insert all of them together.

I cannot have the bulk insert here. This is the exact use case. The messages will be received from the network. It could be one or two messages or it could be 10,000 or more at a time so there is no known number nor any predictability about getting these messages. For bulk you have to know in advance how many messages would there be. If I receive only 3 messages how would I know when to complete the bulk or should I keep waiting for the messages to complete my bulk.

I can try reducing the indices as you mentioned but I doubt if that would make any clear difference. In case you are wondering, the calculated size in bytes of HolderMessage for these tests is around 45 bytes only. The real message could be up to 0.8 Kb. The thing that really confuses me is that if it cannot handle 45 bytes adequately, how would it perform in real where it has to handle almost 1 Kb of messages.

Here is the update. I removed all the indices except the first one on Id. It did alter the performance. It is now at 45 records per second only. Any thoughts?

Regarding the bulk insertion, is it not possible to aggregate (buffer) messages within a certain time frame (e.g. 0.1 seconds)? It is not required to know how many messages arrive and would introduce only a small delay. This would be a simple way and is vastly preferred over individual insertion.

Could you try and create a second test that stores something dead simple like an integer ID with no other data? That would show us the maximum performance of your test machine. Try both individual and bulk insertion.

Anyway, those are the first ideas that come to my mind, but I'm not deeply involved in LiteDB so maybe I'm missing something.

made those changes and here are the results:
106 integers /second

But tell me why Sql server surpasses the figure of 6000+ records /second on the same machine if SSD is the problem

First off, you are not only measuring the insert, you are also measuring the Common.Utility.Serialize(msg). Try putting start and stop of the stopwatch right before and after insert.

Also, you can try to measure with a small test program that does nothing else just insert 10.000 records, once individually and once in bulk then compare the numbers.

You can store the messages in a queue and set a timer that regularly goes over that queue and makes the bulk insert into the db. You would have to do it like that with any db system or even if you were simply writing into a file to get good performance.

Maybe sql server does that for you internally.

you missed the first line of my previous message. Here is it:

made those changes and here are the results:
106 integers /second

this number does not contain any serialize stuff at all. Everything removed from HolderMessage except an int.
I have told you this is a high performance app. it cannot use timers to introduce further delay.

No, sql server does not work like that. It uses data page (the shortest possible unit of storage is page i.e 4 KB)

If you get only 106 ints per second on a SSD, something is misconfigured with your system.

Can you run this please?

using LiteDB;
class Foo {
    [BsonId]
    public int Id { get; set; }
}

class Program {
    static void Main (string[] args) {
        {
            var path = "test" + DateTime.UtcNow.Ticks + ".lite";
            var data = Enumerable
                .Range (1, 10000)
                .Select (i => new Foo () { Id = i })
                .ToArray ();
            var sw = System.Diagnostics.Stopwatch.StartNew ();
            using (var db = new LiteDatabase (path)) {
                var col = db.GetCollection<Foo> ("table");
                col.Insert (data);
            }
            sw.Stop ();
            Console.WriteLine ("Bulk: Inserted " + data.Length + " ints in " + sw.Elapsed);
        }
        {
            var path = "test" + DateTime.UtcNow.Ticks + ".lite";
            var sw = System.Diagnostics.Stopwatch.StartNew ();
            using (var db = new LiteDatabase (path)) {
                var col = db.GetCollection<Foo> ("table");
                for (int i = 1; i < 1000; i++) {
                    var foo = new Foo () { Id = i };
                    col.Insert (foo);
                }
            }
            sw.Stop ();
            Console.WriteLine ("Single inserts, shared connection: Inserted " + 1000 + " ints in " + sw.Elapsed);
        }
        {
            var path = "test" + DateTime.UtcNow.Ticks + ".lite";
            var sw = System.Diagnostics.Stopwatch.StartNew ();
            for (int i = 1; i < 1000; i++) {
                var foo = new Foo () { Id = i };
                using (var db = new LiteDatabase (path)) {
                    var col = db.GetCollection<Foo> ("table");
                    col.Insert (foo);
                }
            }
            sw.Stop ();
            Console.WriteLine ("New connection per int: Inserted " + 1000 + " ints in " + sw.Elapsed);
        }
    }
}

I get

Bulk: Inserted 10000 ints in 00:00:00.1629453
Single inserts, shared connection: Inserted 1000 ints in 00:00:01.3935261
New connection per int: Inserted 1000 ints in 00:00:01.3815246

(post edited to add two more tests)

I'm a bit confused why the shared connection was slightly slower. Switching the order of tests did not change this oddity. Anyway, around 900 ints per second on an SSD even using single inserts, and a _lot_ more in bulk mode.

You must remember that SQL Server is an online database system. LiteDB is file based, has no caching etc. If you want insanely fast maximum insert speed, you should probably use something else. --- Or just collect the incoming data to be written for a small time frame before inserting them in bulk, then LiteDB will easily be sufficient.

One clarification. I do not use SSD. I use SD SCSI device

Your program returned :
Inserted 10000 ints in 00:00:00.2082185

Alright, then bulk insertion does work for you, too. I've edited my above post and included some further explanations. Btw, I hope @mbdavid corrects me if I'm spouting nonsense :o

Bulk insertion is not my need. I need regular inserts to be fast and they fail miserably
Your modified program returned the following:

Bulk: Inserted 10000 ints in 00:00:00.1942937
Single inserts, shared connection: Inserted 1000 ints in 00:00:08.8392658
New connection per int: Inserted 1000 ints in 00:00:07.8547325
Press any key to continue . . .

If you are really sure that this is what you need, then you likely want to use an online instead of a file based DBMS. If I recall correctly, something similar is planned for LiteDB v3, but until then you're probably out of luck.

but why would a database not perform adequately at regular operations and insist on bulk operations? That is something I do not understand. I cannot use Sql Server because the app I'm working on is a small one that will run as a part of bigger app. All I wanted to use LiteDB for, was persistent cache which now I feel was a wrong choice.

My conclusion is

liteDB may be an ambitious effort and maybe a tool of interest for hobbyists but it is nowhere near production ready & cannot be used in serious development

tried sqlite?

@MercedeX You're saying you need a database able to perform 10000 operations per second. If a disk based DBMS seems to have more than that, it uses caching, either internally or via the OS. Since you insist that this violates your consistency model (else bulk insertion would be possible), all changes must be flushed to disk immediately. Before that, obviously, the database has to read at least its header for each operation. Never mind that it probably has to read a few more pages.

So all of the 10000 operations require at least 1 disk read and write naturally, on a regular HDD instead of an SSD. Regular hard disks physically do not support this. For instance, a 15k rpm SAS drive has only around 200 IOPS. So if you find _any_ offline DBMS that can actually do what you want, let me know.

Hi guys. I was running out of Matthew hurricane... but now I'm back at home. @nerai, thanks for this great explanation. English is not my natural language and I have a bit difficult to explain how LiteDB works.

First, about my tests about performance: it's was using v3 (dev branch), Still under development and works diferent from v1 and v2. LiteDB work in a file based system, so it's depends a lot of disk. My laptop has only a SSD, I do not have another disk to test. Performance tests are dificult to made and there is not single way to do. In my example I tested database operations, like search in index, organize documents in pages, create indexes, change documents. As I told, more tests must be done and this just a first example.

To understand way you example is too slow, you need understand how litedb works. In v2 all operations open/close datafile. And, if you are with journal enabled (by default), open/write/close journal file too. It's a lot of disk operation per execution. So, a single insert needs:

  • open datafile in a shared mode
  • read pages (at minimum 4 pages: header page, collections page, first data page and first index page)
  • do database operations (works with pages in memory)
  • open journal file in exclusive mode
  • write all changed pages to journal file
  • write all changed pages to datafile
  • close datafile
  • close journal file
  • delete journal file

When bulk insert are used, all this operations are re-used in a single operation avoiding all disk operations.

My recomendations for your case is:

  • Disable journal file
  • As @nerai says, works with a buffer, uses LiteDB to strore in a delay time (and in async way).
  • Use MemoryStream disk implementation and persist over time all data (using async too).
  • Try others databases file-based, like embedded RazorDB, RaptorDB, Sqlite, ...

Doing more tests in v3, I got this results without bulk insert (1.000 docs).

  • Insert time (journal): 745 ms
  • Insert time (no journal): 432 ms
  • Insert time using new instance (journal): 6608 ms
  • Insert time using new instance (no journal): 5078 ms

Using new instance is use like v2 (when each operation open/close file).


It's not fare compare LiteDB with SQL Server. SQL server is an online database, hold pages in memory cache, writes in async and has more than 20 year of development from Microsoft :)

And yes, LiteDB is hobby tool that I made and share with all comminity for free. Use at your own risk (like any other open source).

Did some experimenting with Sqlite. Used the original program (see the first post in this thread). Here are the results:

Default Setting: 84 records / sec
Journal off: 260 records /sec

So Sqlite beats LiteDB by 4x approx. I don't know how to turn off journal in LiteDB. @mbdavid I'm seriously considering using Memory Stream for database storage and dumping that in a file silently in separate thread. I think that will increase the speed a lot.

A few clarifications:

  • I never said I need 10000 records /second speed. Where did you get that idea?
  • It is true that Sql Server cannot be beaten but then it is too huge for my needs
  • Yes, I understand that IO takes time so I have reasonable expectations not stupid ones
  • The effort for making LiteDB is great, do not think that I'm belittling someone's work.

If I had given up on it, I would not be executing the programs @nerai wrote in comments. @nerai did you realize how quickly you got the results of your program with all the changes? I'm still betting on it.

You could open a question on Stackoverflow and ask how your scenario should be solved. I'd bet I can tell you the most up voted answer beforehand, tho: Buffer the input, then write in bulk. This is what everyone here has suggested, and it is the only solution that works well with offline databases. If you consider storing input to memory at any point in your data flow, then that is essentially the same.

Maybe you have a misconception about bulk insertion. You do _not_ have to know the number of messages in advance. I'll try to explain it:

Your input acceptor thread waits for input. Once input arrives, it starts a timer for, say, 0.1 seconds. The input is added to a queue. Once (whichever happens first) either a) the timer expires or b) the queue is large enough (e.g. 1000 items), the queue is written to the database. This inserts between 1 and 1000 items, and it does never matter how many exactly. Result:

  • Input is stored within at most 0.1 seconds. At the same time, queue size never exceeds 1000 items. So from a _consistency_ point of view, this is alright. It's actually perfect consistency if you wait with the confirmation until data is written, which will take at most 0.1 seconds.
  • On the other hand, the database only has to handle either a) 10 inserts per second, which it can do easily even on a HDD or b) insert many items in bulk, which it can do easily as well. So DBMS and disk are never exceeding their capabilities and will deliver _good performance_.

Hi @MercedeX,

I made a simple example using SQLite vs LiteDB v3. Source avaiable here: https://github.com/mbdavid/LiteDB-Perf

This example is a simple insert and bulk examples using journal enabled and disabled. There are 3 fields/columns only and I add, in both examples, time to all process (include document generation). There are results in my machine:

Total records: 10000

Journal Enabled
SQLite - Insert: 74253 ms - 135 records/second
LiteDB - Insert: 7960 ms - 1256 records/second
SQLite - Bulk  : 266 ms - 37587 records/second
LiteDB - Bulk  : 455 ms - 21942 records/second

Journal Disabled
SQLite - Insert: 6320 ms - 1582 records/second
LiteDB - Insert: 4769 ms - 2097 records/second
SQLite - Bulk  : 198 ms - 50317 records/second
LiteDB - Bulk  : 415 ms - 24057 records/second

In some cases LiteDB is amost 10x faster than SQLite, in others, SQLite is 2x faster than LiteDB.

Again, I'm using LiteDB v3 in a SSD disk, i5 2nd generation with 8Gb RAM on Windows 10. I'm using new LiteEngine class only that not support LINQ/Mapper. Only final version will contains LiteDatabase class with LiteCollection, POCO mapper na d LINQ support.

@MercedeX

Im using LiteDb for the internal Database of my Multiplayer Indie Game, I needed an embedded .net database for store my game information, creatures, items, etc in a easy and quick way (object mappers and dbRef are glorious) I know I will need to swap to an standard Db for real and hard operations, but it covers all my basic needs to store the basic data of my game, I'm thankful to @mbdavid to develop this project and open it to the community, giving support for it and all of these for free, I think you need to organize your thoughts and think again, @mbdavid is not developing a DB for you, maybe he will implement new capabilities that can help with your own needs but you can not really think that a single person can beat a full db with a lot of years, developers, financing and companies behind it.

Just try keep it positive, if LiteDb is not suited for you, then move on to other solution, but the developers are not the real problem here.

By the way, things always happened, maybe they are happening behind your eyes, but software is not magic, maybe Sql Server gives a lot of high performance but mostly thanks to cache and highly improved techniques to store data, you can try to store 100.000 new records to a Sql server, the Db server can return to you that it inserted correctly all the records, but if you power off immediately the server I麓m 100% sure that a huge part of the records will be corrupted or not stored, you can try it by yourself with a virtual machine ;).

So you can try to perform some of these improved techniques on your own with LiteDb (like bulk store with a memory buffer) if you are worry cause the user had the confirmation of his message was stored and maybe something can happens so the message will not be stored for real, cause a save error or whatever, you can try to delay the confirmation response from the server until the data is bulk stored, this is the realistic approach (and maybe the best even without the bulk-cache way).

If you need to be sure 100% that the message is stored correctly then first of all you will need to add relay servers, replicate things, etc, and even with all of that nothing is for sure, Amazon S3 Standard is designed for 99.99% reliability without redundancy (https://aws.amazon.com/s3/reduced-redundancy/).
Can you imagine if you loss 1000 records from each 100.000 messages? and we are talking about Amazon.

@buronix: Only if you went through the pain of reading whole thread before your preaching, you might have noticed that I'm willing to use it, despite the criticism of my team while at the same time trying every option, remedy suggested by the LiteDB community.

and if it interests anyone, it is going to be a part of a grid computation framework for a fortune 500 company in 2017.

@MercedeX you are right, I tried to read the entire thread but I read some parts in the "bad and quick diagonal way", I'm sorry if I disturbed you, it was never my intention.

Tons of luck with your application.

Was this page helpful?
0 / 5 - 0 ratings