_Love_ LiteDB. But why did the maximum index value length shorten in v5 to 255 characters? This means we can't index file paths. We're now jumping through hoops like storing hashes of file names, which slows down our code.
Please, please reconsider extending this length.
@mbdavid What was the technical reason for this change (and such small limits as such)?
Maybe we can think together how to find an alternative more practical way.
I agree with OP (and was already caught), in practice it would be very useful
if indexes cope with Windows max path length (260). So that some 520 would the
"minimum reasonable" limit (~2x due to possible non-ascii characters).
Actually this is moving from inconvenience to dealbreaker--since we can't index file paths, we can't efficiently sort them. As much as we love LiteDB, we'll have to look for another provider :(.
If the design decision was based on saving storage space (1 byte for length)
then there is a compromise approach, "7 bit encoded int". 0-127 require 1 byte,
this must cover a lot of strings in the world. 128+ require more bytes, yes,
but 2 bytes will cover probably almost all practical cases. Not to mention 3
bytes. If there is no other reason but saving storage space then please
consider this approach. In the next major version update of course.
Here is the code that might be borrowed, see System.IO classes:
// System.IO.BinaryWriter
protected void Write7BitEncodedInt(int value)
{
聽聽聽聽uint num;
聽聽聽聽for (num = (uint)value; num >= 128; num >>= 7)
聽聽聽聽{
聽聽聽聽聽聽聽聽Write((byte)(num | 0x80));
聽聽聽聽}
聽聽聽聽Write((byte)num);
}
// System.IO.BinaryReader
protected internal int Read7BitEncodedInt()
{
聽聽聽聽int num = 0;
聽聽聽聽int num2 = 0;
聽聽聽聽byte b;
聽聽聽聽do
聽聽聽聽{
聽聽聽聽聽聽聽聽if (num2 == 35)
聽聽聽聽聽聽聽聽{
聽聽聽聽聽聽聽聽聽聽聽聽throw new FormatException(SR.Format_Bad7BitInt32);
聽聽聽聽聽聽聽聽}
聽聽聽聽聽聽聽聽b = ReadByte();
聽聽聽聽聽聽聽聽num |= (b & 0x7F) << num2;
聽聽聽聽聽聽聽聽num2 += 7;
聽聽聽聽}
聽聽聽聽while ((b & 0x80) != 0);
聽聽聽聽return num;
}
@nightroman @ReflexiveCode This limitation exists because the length of the index value for strings and binary is stored in a byte in v5. We're thinking of ways to increase the limit without changing the datafile - one possibility would be borrowing two bits from the BsonType byte, which only uses values from 0 to 14.
This issue is vitally important on choosing a database for an application.
What are the index length limit in other databases? For example in SQLite and
MongoDB, two potential alternatives. I cannot find any information, maybe they
do not have limits at all. Except implied by other limits, like total record
length limit in SQLite, for example.
We're thinking of ways to increase the limit without changing the datafile -
one possibility would be borrowing two bits from the BsonType byte, which
only uses values from 0 to 14.
If this is possible without "hard breaking the format" then it should be also
possible to integrate "7 bit encoded int" if the same "safe" way. If you
increase the limit just "a little bit" it's still no go for some cases.
"7 bit encoded int" is reasonable for saving space and no limits.
(but may be not applicable if you need fixed number of bytes...)
@nightroman I couldn't find any info regarding SQLite, but MongoDB limits the entire index entry to 1024 bytes and SQL Server limits the index key to 900 and 1700 bytes for clustered and non-clustered indexes respectively.
If the solution I proposed is implemented, it would raise the index key limit to 1023 bytes, which I think is pretty reasonable.
I think 1023 will be a practically reasonable limit.
Just for information and for the sake of knowledge:
MongoDB limits the entire index entry to 1024 bytes
Starting in version 4.2, MongoDB removes the Index Key Limit for featureCompatibilityVersion (fCV) set to "4.2" or greater.
1023 bytes is very reasonable. It's important to consider non-English characters: 1023 is approximately 511 UTF-16 characters; 127 is cutting it pretty short.