ADR-028 is very important because current address format is not future safe if we want to expand to more different account and pubkey schemes .
Discussion about moving ADR-028 is in #5694
Changing address format is a breaking change for many apps.
List and document the dev and client issues with changing the address format.
[EDIT]
This things should work because we don't change addresses for existing keys.
So just to be clear, I am not aware of any proposal to _change_ current addresses. What I have proposed is that new addresses be a different length so that 20 byte addresses are grandfathered as legacy and all non-20 byte addresses are new.
What will break then is the assumption that addresses are always 20 bytes. First and foremost there is a lot of SDK code relying on this assumption - just do a find usages on the address length constant.
For clients, again let's be clear on what will actually break and as far as I know it is simply that new addresses will not be 20 bytes. What you're outlining @robert-zaremba are not things that will break AFAIK (except things affected by ADR 034 but that's a separate concern). Nobody has proposed rewriting secp256k1 addresses to some other format.
How about ed25519 addresses?
BTW - I have reformulated my _initial list_ above
Ed25519 has never been officially enabled so as far as the SDK is concerned that address format is still up discussion
Maybe other SDK users have a custom accounts and would be good to hear their concerns.
The goal of this issue is to investigate all problems we may face both with existing software and potential adoption path.
Maybe other SDK users have a custom accounts and would be good to hear their concerns.
The goal of this issue is to investigate all problems we may face both with existing software and potential adoption path.
Maybe, but I think that's mostly outside of our scope of concern. The whole time we've had this discussion, everyone has emphasized that the only things we need to treat as legacy are secp256k1 and amino multisigs. If people have forked the SDK and added other keys, they should have known what they were getting into because it's really not _that_ easy.
So currently within the SDK, the sdk.AddrLen constant is used only when creating or parsing KVStore keys in the following modules:
I see 3 ways to refactor this code to accommodate a new address format:
Allow variable length addresses but set a MaxAddrLen constant. Then all store keys which are currently use fixed length address arrays would instead use arrays that are MaxAddrLen + 1 bytes long with the first byte being a length prefix.
So if we had 40 byte addresses, store key functions would encode existing 20 byte addresses in a 41 byte array with the first byte being the byte 20, the next 20 bytes being the address, and the rest of the array being filled with zeros. Store key functions could do this using a helper function and the refactor would likely be relatively painless.
We could allow variable length addresses with no MaxAddrLen required and use a surrogate uint64 key (ex. account_number) instead of actual addresses in store keys. Then we'd always need 8 bytes per address in store keys, instead of 20 or 41 or whatever. The downside is that this would require converting to and from the surrogate key when forming addresses. It wouldn't be crazy hard, but it's probably a significantly bigger refactor than 1.
Set a larger fixed AddrLen and re-encode all 20 byte addresses in the new AddrLen with some sort of prefix. We could set things up so that all methods still accept 20 byte bech32 strings, but then the re-encoding happens behind the scenes. But this would be weird for query responses and probably a bunch of other things. Clients would probably hate this and we probably shouldn't even think about it.
Thanks @aaronc, that's helpful. Some immediate questions:
a. When you say (2) is efficient, you mean space-efficient, right? Is it that bad to have 41-byte keys? Maybe having slightly larger store-key sizes wins over doing an additional store read (to retrieve the address mapped to an account_number) in most keeper functions.
b. x/auth's state is a map address->account, where account contains the account_number. Afaik we don't have a map account_number->address or account_number->account. (2) would require such a map in x/auth's state, right?
c. It seems that variable-length addresses (without MaxAddrLen and padding) is not considered. Is that because it requires the introduction of some sort ORM to handle indexing, and that would be too big a refactor?
Also, I'm a bit confused why there are two issues #5694 and this one.
Thanks @aaronc, that's helpful. Some immediate questions:
a. When you say
(2)is efficient, you mean space-efficient, right? Is it that bad to have 41-byte keys? Maybe having slightly larger store-key sizes wins over doing an additional store read (to retrieve the address mapped to anaccount_number) in most keeper functions.
Right
b. x/auth's state is a map
address->account, where account contains theaccount_number. Afaik we don't have a mapaccount_number->addressoraccount_number->account.(2)would require such a map in x/auth's state, right?
Right
c. It seems that variable-length addresses (without MaxAddrLen and padding) is not considered. Is that because it requires the introduction of some sort ORM to handle indexing, and that would be too big a refactor?
I'm confused. That's (2). Am I missing something?
Also, I'm a bit confused why there are two issues #5694 and this one.
🤷
I'm confused. That's (2). Am I missing something?
I was referring to variable-length addresses inside the store key. Same as (1), but without MaxAddrLen and padding. That won't use account numbers at all.
I was referring to variable-length addresses inside the store key. Same as
(1), but without MaxAddrLen and padding. That won't use account numbers at all.
Oh, well you need some sort of way of splitting components in store keys. It's hard to do that with binary data without fixed length keys. We could serialize addresses as strings and use the null terminator to separate but that's even more storage space. I guess we can give that as a fourth alternative.
Oh, well you need some sort of way of splitting components in store keys. It's hard to do that with binary data without fixed length keys. We could serialize addresses as strings and use the null terminator to separate but that's even more storage space. I guess we can give that as a fourth alternative.
Maybe I'm missing something, but afaiu you would just need a length prefix: store-key = [components-before][addr-length-prefix-byte][variable-length-addr][components-after], as presented in (1), but without filling in with zeroes.
Overall I'm more in favor in (1). I don't think (2) is really more efficient overall, since we need an additional map account_number->address, and frequent additional reads to that storage.
(2)I don't think this is a good solution. It will make addresses not compatible between different chains. Moreover, it's hard to say if we will win any performance: both storage (because we need to store additional mapping) or performance (additional lookups). We would need to make tests for that.
(1.1) - modification of (1)I think we will end up with capped addresses, 32bytes. For legacy addresses (20 bytes) we don't need to store additional byte for length. We just fill the leading 12 bytes with zero. The only problem is with generating new addresses: If a new address will have zeros in 12 leading bytes (which will probably never happen, probability is 1 : 8e28) we will generate a new address.
Let's firstly finalize the algorithm in adr-028 - with that we will have more information for migration.
Oh, well you need some sort of way of splitting components in store keys. It's hard to do that with binary data without fixed length keys. We could serialize addresses as strings and use the null terminator to separate but that's even more storage space. I guess we can give that as a fourth alternative.
Maybe I'm missing something, but afaiu you would just need a length prefix:
store-key = [components-before][addr-length-prefix-byte][variable-length-addr][components-after], as presented in(1), but without filling in with zeroes.
Hmm, not quite - you may encounter problems with prefix stores with that approach. Try to think a bit about how prefix stores get constructed. Without fixed length parts, you generally need some lexical separator
Overall I'm more in favor in
(1). I don't think(2)is really more efficient overall, since we need an additional mapaccount_number->address, and frequent additional reads to that storage.
I'm pretty sure that can be optimized away with a memory cache.
Re:
(2)I don't think this is a good solution. It will make addresses not compatible between different chains. Moreover, it's hard to say if we will win any performance: both storage (because we need to store additional mapping) or performance (additional lookups). We would need to make tests for that.
Hmmm yeah (2) doesn't affect address format at all. We're talking about store keys. Maybe @amaurymartiny can explain or we can discuss later.
(1.1)- modification of(1)I think we will end up with capped addresses, 32bytes. For legacy addresses (20 bytes) we don't need to store additional byte for length. We just fill the leading 12 bytes with
zero. The only problem is with generating new addresses: If a new address will have zeros in 12 leading bytes (which will probably never happen, probability is1 : 8e28) we will generate a new address.Let's firstly finalize the algorithm in adr-028 - with that we will have more information for migration.
I don't see why we would make that compromise to save 1 byte of storage.
Also the format of adr 028 is somewhat orthogonal to how the SDK deals with variable length addresses. That problem needs to be solved to enable adr 028 to move in a different direction.
Oh, well you need some sort of way of splitting components in store keys. It's hard to do that with binary data without fixed length keys. We could serialize addresses as strings and use the null terminator to separate but that's even more storage space. I guess we can give that as a fourth alternative.
Maybe I'm missing something, but afaiu you would just need a length prefix:
store-key = [components-before][addr-length-prefix-byte][variable-length-addr][components-after], as presented in(1), but without filling in with zeroes.Hmm, not quite - you may encounter problems with prefix stores with that approach. Try to think a bit about how prefix stores get constructed. Without fixed length parts, you generally need some lexical separator
Actually this could maybe work @amaurymartiny as long as components before is also length prefixed or fixed length. I think it needs to be thought through case by case depending on the other components. But yeah maybe we can avoid the padding in which case max len = 255 and that should be plenty. Let's think about it a bit.
So, I think length prefixing will work for multi-part keys without padding as long as all parts except the last one are prefixed, (i.e. [part1 len][part1][part2 len][part2][part3]. Does that sound correct? And theoretically we could prefix with a varint to not have a 255 byte limit, but 255 should be plenty.
The case where length prefixing is problematic is if you care about keys being lexically sorted in index scans. In those cases, fixed length bytes or strings with lexical separators are needed. I tried using length suffixing in the orm to deal with this, but I think that also has its pitfalls. But anyway, we don't need lexical sorting for addresses at all.
Notes from the today meeting: https://hackmd.io/@robert-zaremba/S1hsgDq2w