Litedb: Index value maximum length

Created on 27 Mar 2020 · 8Comments · Source: mbdavid/LiteDB

_Love_ LiteDB. But why did the maximum index value length shorten in v5 to 255 characters? This means we can't index file paths. We're now jumping through hoops like storing hashes of file names, which slows down our code.

Please, please reconsider extending this length.

question

Source

ReflexiveCode

👍1

All 8 comments

@mbdavid What was the technical reason for this change (and such small limits as such)?
Maybe we can think together how to find an alternative more practical way.

I agree with OP (and was already caught), in practice it would be very useful
if indexes cope with Windows max path length (260). So that some 520 would the
"minimum reasonable" limit (~2x due to possible non-ascii characters).

nightroman on 27 Mar 2020

Actually this is moving from inconvenience to dealbreaker--since we can't index file paths, we can't efficiently sort them. As much as we love LiteDB, we'll have to look for another provider :(.

ReflexiveCode on 28 Mar 2020

If the design decision was based on saving storage space (1 byte for length)
then there is a compromise approach, "7 bit encoded int". 0-127 require 1 byte,
this must cover a lot of strings in the world. 128+ require more bytes, yes,
but 2 bytes will cover probably almost all practical cases. Not to mention 3
bytes. If there is no other reason but saving storage space then please
consider this approach. In the next major version update of course.

Here is the code that might be borrowed, see System.IO classes:

// System.IO.BinaryWriter
protected void Write7BitEncodedInt(int value)
{
    uint num;
    for (num = (uint)value; num >= 128; num >>= 7)
    {
        Write((byte)(num | 0x80));
    }
    Write((byte)num);
}

// System.IO.BinaryReader
protected internal int Read7BitEncodedInt()
{
    int num = 0;
    int num2 = 0;
    byte b;
    do
    {
        if (num2 == 35)
        {
            throw new FormatException(SR.Format_Bad7BitInt32);
        }
        b = ReadByte();
        num |= (b & 0x7F) << num2;
        num2 += 7;
    }
    while ((b & 0x80) != 0);
    return num;
}

nightroman on 28 Mar 2020

@nightroman @ReflexiveCode This limitation exists because the length of the index value for strings and binary is stored in a byte in v5. We're thinking of ways to increase the limit without changing the datafile - one possibility would be borrowing two bits from the BsonType byte, which only uses values from 0 to 14.

lbnascimento on 29 Mar 2020

👍1

This issue is vitally important on choosing a database for an application.

What are the index length limit in other databases? For example in SQLite and
MongoDB, two potential alternatives. I cannot find any information, maybe they
do not have limits at all. Except implied by other limits, like total record
length limit in SQLite, for example.

We're thinking of ways to increase the limit without changing the datafile -
one possibility would be borrowing two bits from the BsonType byte, which
only uses values from 0 to 14.

If this is possible without "hard breaking the format" then it should be also
possible to integrate "7 bit encoded int" if the same "safe" way. If you
increase the limit just "a little bit" it's still no go for some cases.
"7 bit encoded int" is reasonable for saving space and no limits.
(but may be not applicable if you need fixed number of bytes...)

nightroman on 29 Mar 2020

@nightroman I couldn't find any info regarding SQLite, but MongoDB limits the entire index entry to 1024 bytes and SQL Server limits the index key to 900 and 1700 bytes for clustered and non-clustered indexes respectively.

If the solution I proposed is implemented, it would raise the index key limit to 1023 bytes, which I think is pretty reasonable.

lbnascimento on 29 Mar 2020

👍1

I think 1023 will be a practically reasonable limit.

Just for information and for the sake of knowledge:

MongoDB limits the entire index entry to 1024 bytes

This is not the case anymore:

Starting in version 4.2, MongoDB removes the Index Key Limit for featureCompatibilityVersion (fCV) set to "4.2" or greater.

nightroman on 1 Apr 2020

1023 bytes is very reasonable. It's important to consider non-English characters: 1023 is approximately 511 UTF-16 characters; 127 is cutting it pretty short.

ReflexiveCode on 1 Apr 2020

Was this page helpful?

0 / 5 - 0 ratings