Neo: StorageKey - why use WriteBytesWithGrouping()?

Created on 3 Jun 2019 · 16Comments · Source: neo-project/neo

I'm doing an audit between neo-python and neo-cli and encountered an issue with one of the storage keys. Specifically blocks 131179 of TestNet reports a new storage item added being

{"state":"Added",
        "key":"2170e2e4a128ba6b287b14c1ad4be8d03bbf04f031313131313131313131313131313131003131313131313131313100000000000006",
        "value":"000f32323232323232323232323232323200"}]},

The key can be broken down into:

Scripthash: 2170e2e4a128ba6b287b14c1ad4be8d03bbf04f0
key_value: 313131313131313131313131313131310031313131313131313131
padding: 00000000000006 (to create 16 byte alignment)

if you look closely at the key value you'll see there is a 00 inserted after 16 bytes. The actual key (as can be seen in the screenshot) does not have this 00 byte.
Screenshot 2019-06-03 at 16 12 36
This deviation is caused by WriteBytesWithGrouping() in the code below

https://github.com/neo-project/neo/blob/55ffca977a2b336d16bc986a40c770c4f5a1e589/neo/Ledger/StorageKey.cs#L43-L46

My question is;
Why are we using this method for serialization instead of a normal write?

It gives unexpected behaviour when you use it (like done here in the StatesDumper plugin):
c# state["key"] = trackable.Key.ToArray().ToHexString();
I believe the key should be the real key, not some modified form.

question

Source

ixje

👍3

All 16 comments

I think the same as you, should be a regular write

shargon on 3 Jun 2019

It is for "Find()". See #200

erikzhang on 3 Jun 2019

👍1

Thanks 😞

I don't see this changing anymore for 2.x but should we maybe change the approach for 3.0 to remove the confusing behaviour of ToArrray() for StorageKey?

This would mean updating Storage_Put/Delete/Get instead of using WriteBytesWithGrouping.

ixje on 4 Jun 2019

I always wanted to know this! hahaha :)
Well, this is a storage specific thing or is it necessarily part of the protocol? I mean, other clients could perhaps not implement this and get the same results with Find perhaps?

igormcoelho on 5 Jun 2019

It's a client thing. Neo-python doesn't use it and Find works fine because we fixed Put/Delete/Get instead

ixje on 5 Jun 2019

😕1

It has to do with the way the length is encoded. I had to deal with this in some of the plugins I wrote. It should be able to get the length from LevelDB though and not need to encode the length into the key. I didn’t deep dive to fix it, but it should be fixable for NEO 3.0. As long as the storage engine can save and report key length; which it can, there isn’t a good reason to do what is being done.

jsolman on 6 Jun 2019

👍1

May I ask to reopen this bug? I think the issue is still relevant for NEO 3, but now with #1383 changes it has deeper impact for node compatibility. In neo-go we also have implemented a simpler key serialization scheme without grouping and it works fine (some conversion is needed to compare our 2.0 state with neo-storage-audit, but that's a minor annoyance affecting just that --- state change dumps), but now if MPT is to be added using this scheme then we will have to either change the key serialization format or make conversions for MPT calculations. None of these options make me happy as I think it should be trivial to fix it in C# code (and simplify C# code at the same time).

roman-khimov on 13 Apr 2020

Yes, we should create a standard. Could you share your neo-go group impementation?

shargon on 13 Apr 2020

We just don't do any grouping, we append byte representation of the key to the scripthash, like this:
https://github.com/nspcc-dev/neo-go/blob/405fa9c411637a1b77a98132139e177279598332/pkg/core/dao/dao.go#L466

And it works for Find well in that if there are keys "ab" and "ac" in the DB Find returns them when searching for prefix a.

roman-khimov on 13 Apr 2020

Please check this thread https://github.com/neo-project/neo/issues/199 the problem for us is WriteVarBytes that it could produce undesired results. I think that store should return a fixed size, without using WriteVarBytes but it was designed for working with streams.

shargon on 13 Apr 2020

but it was designed for working with streams.

Exactly!

erikzhang on 13 Apr 2020

Please check this thread #199 the problem for us is WriteVarBytes

I've seen that, but we do not use WriteVarBytes for keys and I still can't see anything preventing C# code from doing the same thing.

but it was designed for working with streams.

Well, if it's just to make this StorageKey ISerializable then yes, it does make some sense, but then the question is --- do we really need ISerializable here?

Update: I see DataCache needs it, still there should be some way.

I mean, from pure DB design perspective (especially given that after #1383 every implementation will have to deal with this format one way or another) this doesn't seem to be necessary to do the job, so making this transformation a standard required for proper MPT implementation doesn't seem to be a good solution to me, there should be some room for more efficient implementations.

roman-khimov on 13 Apr 2020

OK, so we have stream-based ISerializable and DataCache using that both for keys and values. All good, but we have variable-length StorageKey that can't use our nice WriteVarBytes because of Storage.Find. The StorageKey is only used as a key for the DB.

What I'm thinking about is that every stream has an end, one way or another. So I can't see anything preventing us from creating a ReadTillEOF helper that can be used for StorageKey. Writing never was a problem really, Write will do it well, the only problem is reading. But as we never read StorageKey from the network, we only read it from our own DB, simple reading till EOF (or even EndOfStreamException) shouldn't be an issue.

Pros:

~5 lines for ReadTillEOF would allow to remove some tens of lines of otherwise unused ReadBytesWithGrouping/WriteBytesWithGrouping code
shorter keys!
less processing required
simplifies MPT calculation

Cons:

sorry, I don't see any

What do you think?

roman-khimov on 13 Apr 2020

👍1

What I'm thinking about is that every stream has an end, one way or another.

Network stream hasn't an end.

erikzhang on 13 Apr 2020

Luckily we're not dealing with networking in this particular case. I do understand that it makes StorageKey somewhat impure wrt general properties expected from a Serializable thing (like you can't put two StorageKey values into a stream and then read them back with TillEOF encoding), but at the same time it's a very special case where our normal encoding principles (like VarBytes) just don't work and one way or another but we have to make something special for this particular case. I think TillEOF solution is better in that it's simpler and solves the problem.

Also, again, it's one thing for an implementation to have whatever encoding scheme it wants to and it's completely another thing to standardize something at the protocol level. So what I'm really concerned about is MPT calculation (which is soon to be a part of the protocol), I think it'd be nice to have simpler encoding used for it even if BytesWithGrouping scheme is to remain in the C# implementation.

roman-khimov on 13 Apr 2020

1566

erikzhang on 14 Apr 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Isolate unit tests

shargon · 4Comments

What would be the price of Gas after 3.0 Release ? It will go down ? Because of unlimited supply ?

garrey332 · 3Comments

Export contract storage to disk

igormcoelho · 4Comments

Refactor ECPoint for NEO3x

vncoelho · 3Comments

How to create raw transaction?

borovik96 · 4Comments