Neo: StorageKey - why use WriteBytesWithGrouping()?

Created on 3 Jun 2019  路  16Comments  路  Source: neo-project/neo

I'm doing an audit between neo-python and neo-cli and encountered an issue with one of the storage keys. Specifically blocks 131179 of TestNet reports a new storage item added being

{"state":"Added",
        "key":"2170e2e4a128ba6b287b14c1ad4be8d03bbf04f031313131313131313131313131313131003131313131313131313100000000000006",
        "value":"000f32323232323232323232323232323200"}]},

The key can be broken down into:

  • Scripthash: 2170e2e4a128ba6b287b14c1ad4be8d03bbf04f0
  • key_value: 313131313131313131313131313131310031313131313131313131
  • padding: 00000000000006 (to create 16 byte alignment)

if you look closely at the key value you'll see there is a 00 inserted after 16 bytes. The actual key (as can be seen in the screenshot) does not have this 00 byte.
Screenshot 2019-06-03 at 16 12 36
This deviation is caused by WriteBytesWithGrouping() in the code below

https://github.com/neo-project/neo/blob/55ffca977a2b336d16bc986a40c770c4f5a1e589/neo/Ledger/StorageKey.cs#L43-L46

My question is;
Why are we using this method for serialization instead of a normal write?

It gives unexpected behaviour when you use it (like done here in the StatesDumper plugin):
c# state["key"] = trackable.Key.ToArray().ToHexString();
I believe the key should be the real key, not some modified form.

question

All 16 comments

I think the same as you, should be a regular write

It is for "Find()". See #200

Thanks 馃槥

I don't see this changing anymore for 2.x but should we maybe change the approach for 3.0 to remove the confusing behaviour of ToArrray() for StorageKey?

This would mean updating Storage_Put/Delete/Get instead of using WriteBytesWithGrouping.

I always wanted to know this! hahaha :)
Well, this is a storage specific thing or is it necessarily part of the protocol? I mean, other clients could perhaps not implement this and get the same results with Find perhaps?

It's a client thing. Neo-python doesn't use it and Find works fine because we fixed Put/Delete/Get instead

It has to do with the way the length is encoded. I had to deal with this in some of the plugins I wrote. It should be able to get the length from LevelDB though and not need to encode the length into the key. I didn鈥檛 deep dive to fix it, but it should be fixable for NEO 3.0. As long as the storage engine can save and report key length; which it can, there isn鈥檛 a good reason to do what is being done.

May I ask to reopen this bug? I think the issue is still relevant for NEO 3, but now with #1383 changes it has deeper impact for node compatibility. In neo-go we also have implemented a simpler key serialization scheme without grouping and it works fine (some conversion is needed to compare our 2.0 state with neo-storage-audit, but that's a minor annoyance affecting just that --- state change dumps), but now if MPT is to be added using this scheme then we will have to either change the key serialization format or make conversions for MPT calculations. None of these options make me happy as I think it should be trivial to fix it in C# code (and simplify C# code at the same time).

Yes, we should create a standard. Could you share your neo-go group impementation?

We just don't do any grouping, we append byte representation of the key to the scripthash, like this:
https://github.com/nspcc-dev/neo-go/blob/405fa9c411637a1b77a98132139e177279598332/pkg/core/dao/dao.go#L466

And it works for Find well in that if there are keys "ab" and "ac" in the DB Find returns them when searching for prefix a.

Please check this thread https://github.com/neo-project/neo/issues/199 the problem for us is WriteVarBytes that it could produce undesired results. I think that store should return a fixed size, without using WriteVarBytes but it was designed for working with streams.

but it was designed for working with streams.

Exactly!

Please check this thread #199 the problem for us is WriteVarBytes

I've seen that, but we do not use WriteVarBytes for keys and I still can't see anything preventing C# code from doing the same thing.

but it was designed for working with streams.

Well, if it's just to make this StorageKey ISerializable then yes, it does make some sense, but then the question is --- do we really need ISerializable here?

Update: I see DataCache needs it, still there should be some way.

I mean, from pure DB design perspective (especially given that after #1383 every implementation will have to deal with this format one way or another) this doesn't seem to be necessary to do the job, so making this transformation a standard required for proper MPT implementation doesn't seem to be a good solution to me, there should be some room for more efficient implementations.

OK, so we have stream-based ISerializable and DataCache using that both for keys and values. All good, but we have variable-length StorageKey that can't use our nice WriteVarBytes because of Storage.Find. The StorageKey is only used as a key for the DB.

What I'm thinking about is that every stream has an end, one way or another. So I can't see anything preventing us from creating a ReadTillEOF helper that can be used for StorageKey. Writing never was a problem really, Write will do it well, the only problem is reading. But as we never read StorageKey from the network, we only read it from our own DB, simple reading till EOF (or even EndOfStreamException) shouldn't be an issue.

Pros:

  • ~5 lines for ReadTillEOF would allow to remove some tens of lines of otherwise unused ReadBytesWithGrouping/WriteBytesWithGrouping code
  • shorter keys!
  • less processing required
  • simplifies MPT calculation

Cons:

  • sorry, I don't see any

What do you think?

What I'm thinking about is that every stream has an end, one way or another.

Network stream hasn't an end.

Luckily we're not dealing with networking in this particular case. I do understand that it makes StorageKey somewhat impure wrt general properties expected from a Serializable thing (like you can't put two StorageKey values into a stream and then read them back with TillEOF encoding), but at the same time it's a very special case where our normal encoding principles (like VarBytes) just don't work and one way or another but we have to make something special for this particular case. I think TillEOF solution is better in that it's simpler and solves the problem.

Also, again, it's one thing for an implementation to have whatever encoding scheme it wants to and it's completely another thing to standardize something at the protocol level. So what I'm really concerned about is MPT calculation (which is soon to be a part of the protocol), I think it'd be nice to have simpler encoding used for it even if BytesWithGrouping scheme is to remain in the C# implementation.

1566

Was this page helpful?
0 / 5 - 0 ratings

Related issues

shargon picture shargon  路  4Comments

garrey332 picture garrey332  路  3Comments

igormcoelho picture igormcoelho  路  4Comments

vncoelho picture vncoelho  路  3Comments

borovik96 picture borovik96  路  4Comments