Cosmos-sdk: ADR-028 extensible address format: breaking changes investigation

Created on 27 Nov 2020 · 23Comments · Source: cosmos/cosmos-sdk

Summary

ADR-028 is very important because current address format is not future safe if we want to expand to more different account and pubkey schemes .

Discussion about moving ADR-028 is in #5694

Problem Definition

Changing address format is a breaking change for many apps.

Proposal

List and document the dev and client issues with changing the address format.

crypto keys

Source

robert-zaremba

👍1

All 23 comments

[EDIT]

This things should work because we don't change addresses for existing keys.

SDKs and any library which are generating a pubkey for secp256k1
all wallets which generate a pubkey (mobile, browser, hardware) for secp256k1 keys
block explorers and indexers which relay on current pubkey -> address relationship for secp256k1 keys

robert-zaremba on 27 Nov 2020

So just to be clear, I am not aware of any proposal to _change_ current addresses. What I have proposed is that new addresses be a different length so that 20 byte addresses are grandfathered as legacy and all non-20 byte addresses are new.

What will break then is the assumption that addresses are always 20 bytes. First and foremost there is a lot of SDK code relying on this assumption - just do a find usages on the address length constant.

For clients, again let's be clear on what will actually break and as far as I know it is simply that new addresses will not be 20 bytes. What you're outlining @robert-zaremba are not things that will break AFAIK (except things affected by ADR 034 but that's a separate concern). Nobody has proposed rewriting secp256k1 addresses to some other format.

aaronc on 27 Nov 2020

How about ed25519 addresses?

robert-zaremba on 27 Nov 2020

BTW - I have reformulated my _initial list_ above

robert-zaremba on 27 Nov 2020

Ed25519 has never been officially enabled so as far as the SDK is concerned that address format is still up discussion

aaronc on 27 Nov 2020

👍1

Maybe other SDK users have a custom accounts and would be good to hear their concerns.

The goal of this issue is to investigate all problems we may face both with existing software and potential adoption path.

robert-zaremba on 27 Nov 2020

Maybe other SDK users have a custom accounts and would be good to hear their concerns.

The goal of this issue is to investigate all problems we may face both with existing software and potential adoption path.

Maybe, but I think that's mostly outside of our scope of concern. The whole time we've had this discussion, everyone has emphasized that the only things we need to treat as legacy are secp256k1 and amino multisigs. If people have forked the SDK and added other keys, they should have known what they were getting into because it's really not _that_ easy.

aaronc on 1 Dec 2020

So currently within the SDK, the sdk.AddrLen constant is used only when creating or parsing KVStore keys in the following modules:

bank
distribution
gov
slashing
staking

I see 3 ways to refactor this code to accommodate a new address format:

(1) The mostly painless way

Allow variable length addresses but set a MaxAddrLen constant. Then all store keys which are currently use fixed length address arrays would instead use arrays that are MaxAddrLen + 1 bytes long with the first byte being a length prefix.

So if we had 40 byte addresses, store key functions would encode existing 20 byte addresses in a 41 byte array with the first byte being the byte 20, the next 20 bytes being the address, and the rest of the array being filled with zeros. Store key functions could do this using a helper function and the refactor would likely be relatively painless.

(2) The more efficient way

We could allow variable length addresses with no MaxAddrLen required and use a surrogate uint64 key (ex. account_number) instead of actual addresses in store keys. Then we'd always need 8 bytes per address in store keys, instead of 20 or 41 or whatever. The downside is that this would require converting to and from the surrogate key when forming addresses. It wouldn't be crazy hard, but it's probably a significantly bigger refactor than 1.

(3) The probably bad way

Set a larger fixed AddrLen and re-encode all 20 byte addresses in the new AddrLen with some sort of prefix. We could set things up so that all methods still accept 20 byte bech32 strings, but then the re-encoding happens behind the scenes. But this would be weird for query responses and probably a bunch of other things. Clients would probably hate this and we probably shouldn't even think about it.

aaronc on 1 Dec 2020

👍1

Thanks @aaronc, that's helpful. Some immediate questions:

a. When you say (2) is efficient, you mean space-efficient, right? Is it that bad to have 41-byte keys? Maybe having slightly larger store-key sizes wins over doing an additional store read (to retrieve the address mapped to an account_number) in most keeper functions.

b. x/auth's state is a map address->account, where account contains the account_number. Afaik we don't have a map account_number->address or account_number->account. (2) would require such a map in x/auth's state, right?

c. It seems that variable-length addresses (without MaxAddrLen and padding) is not considered. Is that because it requires the introduction of some sort ORM to handle indexing, and that would be too big a refactor?

Also, I'm a bit confused why there are two issues #5694 and this one.

amaurymartiny on 1 Dec 2020

Thanks @aaronc, that's helpful. Some immediate questions:

a. When you say (2) is efficient, you mean space-efficient, right? Is it that bad to have 41-byte keys? Maybe having slightly larger store-key sizes wins over doing an additional store read (to retrieve the address mapped to an account_number) in most keeper functions.

Right

b. x/auth's state is a map address->account, where account contains the account_number. Afaik we don't have a map account_number->address or account_number->account. (2) would require such a map in x/auth's state, right?

Right

c. It seems that variable-length addresses (without MaxAddrLen and padding) is not considered. Is that because it requires the introduction of some sort ORM to handle indexing, and that would be too big a refactor?

I'm confused. That's (2). Am I missing something?

Also, I'm a bit confused why there are two issues #5694 and this one.

🤷

aaronc on 1 Dec 2020

I'm confused. That's (2). Am I missing something?

I was referring to variable-length addresses inside the store key. Same as (1), but without MaxAddrLen and padding. That won't use account numbers at all.

amaurymartiny on 1 Dec 2020

I was referring to variable-length addresses inside the store key. Same as (1), but without MaxAddrLen and padding. That won't use account numbers at all.

Oh, well you need some sort of way of splitting components in store keys. It's hard to do that with binary data without fixed length keys. We could serialize addresses as strings and use the null terminator to separate but that's even more storage space. I guess we can give that as a fourth alternative.

aaronc on 1 Dec 2020

Oh, well you need some sort of way of splitting components in store keys. It's hard to do that with binary data without fixed length keys. We could serialize addresses as strings and use the null terminator to separate but that's even more storage space. I guess we can give that as a fourth alternative.

Maybe I'm missing something, but afaiu you would just need a length prefix: store-key = [components-before][addr-length-prefix-byte][variable-length-addr][components-after], as presented in (1), but without filling in with zeroes.

amaurymartiny on 2 Dec 2020

Overall I'm more in favor in (1). I don't think (2) is really more efficient overall, since we need an additional map account_number->address, and frequent additional reads to that storage.

amaurymartiny on 2 Dec 2020

Re: `(2)`

I don't think this is a good solution. It will make addresses not compatible between different chains. Moreover, it's hard to say if we will win any performance: both storage (because we need to store additional mapping) or performance (additional lookups). We would need to make tests for that.

robert-zaremba on 2 Dec 2020

`(1.1)` - modification of `(1)`

I think we will end up with capped addresses, 32bytes. For legacy addresses (20 bytes) we don't need to store additional byte for length. We just fill the leading 12 bytes with zero. The only problem is with generating new addresses: If a new address will have zeros in 12 leading bytes (which will probably never happen, probability is 1 : 8e28) we will generate a new address.

Let's firstly finalize the algorithm in adr-028 - with that we will have more information for migration.

robert-zaremba on 2 Dec 2020

Oh, well you need some sort of way of splitting components in store keys. It's hard to do that with binary data without fixed length keys. We could serialize addresses as strings and use the null terminator to separate but that's even more storage space. I guess we can give that as a fourth alternative.

Maybe I'm missing something, but afaiu you would just need a length prefix: store-key = [components-before][addr-length-prefix-byte][variable-length-addr][components-after], as presented in (1), but without filling in with zeroes.

Hmm, not quite - you may encounter problems with prefix stores with that approach. Try to think a bit about how prefix stores get constructed. Without fixed length parts, you generally need some lexical separator

aaronc on 2 Dec 2020

Overall I'm more in favor in (1). I don't think (2) is really more efficient overall, since we need an additional map account_number->address, and frequent additional reads to that storage.

I'm pretty sure that can be optimized away with a memory cache.

aaronc on 2 Dec 2020

Re: (2)

I don't think this is a good solution. It will make addresses not compatible between different chains. Moreover, it's hard to say if we will win any performance: both storage (because we need to store additional mapping) or performance (additional lookups). We would need to make tests for that.

Hmmm yeah (2) doesn't affect address format at all. We're talking about store keys. Maybe @amaurymartiny can explain or we can discuss later.

aaronc on 2 Dec 2020

(1.1) - modification of (1)

I think we will end up with capped addresses, 32bytes. For legacy addresses (20 bytes) we don't need to store additional byte for length. We just fill the leading 12 bytes with zero. The only problem is with generating new addresses: If a new address will have zeros in 12 leading bytes (which will probably never happen, probability is 1 : 8e28) we will generate a new address.

Let's firstly finalize the algorithm in adr-028 - with that we will have more information for migration.

I don't see why we would make that compromise to save 1 byte of storage.

Also the format of adr 028 is somewhat orthogonal to how the SDK deals with variable length addresses. That problem needs to be solved to enable adr 028 to move in a different direction.

aaronc on 2 Dec 2020

Oh, well you need some sort of way of splitting components in store keys. It's hard to do that with binary data without fixed length keys. We could serialize addresses as strings and use the null terminator to separate but that's even more storage space. I guess we can give that as a fourth alternative.

Maybe I'm missing something, but afaiu you would just need a length prefix: store-key = [components-before][addr-length-prefix-byte][variable-length-addr][components-after], as presented in (1), but without filling in with zeroes.

Hmm, not quite - you may encounter problems with prefix stores with that approach. Try to think a bit about how prefix stores get constructed. Without fixed length parts, you generally need some lexical separator

Actually this could maybe work @amaurymartiny as long as components before is also length prefixed or fixed length. I think it needs to be thought through case by case depending on the other components. But yeah maybe we can avoid the padding in which case max len = 255 and that should be plenty. Let's think about it a bit.

aaronc on 2 Dec 2020

👍1

So, I think length prefixing will work for multi-part keys without padding as long as all parts except the last one are prefixed, (i.e. [part1 len][part1][part2 len][part2][part3]. Does that sound correct? And theoretically we could prefix with a varint to not have a 255 byte limit, but 255 should be plenty.

The case where length prefixing is problematic is if you care about keys being lexically sorted in index scans. In those cases, fixed length bytes or strings with lexical separators are needed. I tried using length suffixing in the orm to deal with this, but I think that also has its pitfalls. But anyway, we don't need lexical sorting for addresses at all.

aaronc on 2 Dec 2020

Notes from the today meeting: https://hackmd.io/@robert-zaremba/S1hsgDq2w

robert-zaremba on 18 Dec 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Rejected proposal deposits shouldn't be burned

fedekunze · 3Comments

Staking amount and denom does not show on the ledger when creating a validator

adrianbrink · 3Comments

Better error handling for faulty seeds

faboweb · 3Comments

x/staking: A sequence of transactions causes one to delete a non-existent validator

ValarDragon · 3Comments

gaiacli keys show <multisig> should show constituent pubkeys, and weights

ValarDragon · 3Comments

Cosmos-sdk: ADR-028 extensible address format: breaking changes investigation

Summary

Problem Definition

Proposal

All 23 comments

(1) The mostly painless way

(2) The more efficient way

(3) The probably bad way

Re: (2)

(1.1) - modification of (1)

Re: (2)

(1.1) - modification of (1)

Related issues

Re: `(2)`

`(1.1)` - modification of `(1)`

Re: `(2)`

`(1.1)` - modification of `(1)`