Cosmos-sdk: Genesis Protobuf Migration

Created on 2 Apr 2020  路  17Comments  路  Source: cosmos/cosmos-sdk

Context

  • with protobuf, we now have a new JSON format which we want to support
  • streaming genesis is desirable because of large imports and exports from mature zones
  • with the migration to Any, the same structs can support both amino and protobuf JSON

Roadmap

  • [x] #6717 https://github.com/cosmos/cosmos-sdk/pull/6734 #6860 migrate x/genutil to new TxConfig
  • [x] Ensure all module GenesisState structs properly support both amino and proto JSON (this usually will involve migrating interfaces to Any and generating GenesisState from a .proto file)

    • [x] (https://github.com/cosmos/cosmos-sdk/pull/6880) x/auth @blushi

    • [x] (#6835) x/bank @atheeshp

    • [x] (#6835) x/crisis @atheeshp

    • [x] (#6867) x/capability @atheeshp

    • [x] (#6835) x/distribution @atheeshp

    • [x] (#6864) x/evidence @blushi

    • [x] (#6835) x/mint @atheeshp

    • [x] (#6867) x/genutil @atheeshp

    • [x] (#6866) x/gov @sahith-narahari

    • [x] (#6881) x/slashing @atheeshp

    • [x] (#6835) x/staking @atheeshp

    • [x] (#6878) x/ibc @atheeshp

    • [x] (#6866) x/ibc-transfer @sahith-narahari

  • [ ] enable proto JSON in simapp

/cc @alexanderbez @clevinson

encoding genesis pinned

Most helpful comment

@aaronc Our exported genstate for Columbus-3 (December last year) was around 500M. Assuming filesize growth is linear to account number growth that would imply a Columbus-4 genstate fillesize of > 1GB. The first block took a good 2 minutes for the fastest validators to process, and a lot longer for most others.

While I understand that the Hub would not have similar issues, think the SDK development should reflect needs of client chains that see sizable use.

All 17 comments

GenTX decoding from genesis is very critical as well as it is required to run a node. DeliverGenTxs is still amino.

will part of this also depend on tendermint genesis work or is it orthogonal? I am starting to delve into the tendermint side of JSON encoding

will part of this also depend on tendermint genesis work or is it orthogonal? I am starting to delve into the tendermint side of JSON encoding

@marbar3778 Can you share some details/link about this?

With #6147, we now have an easy pathway to support both Amino and Proto JSON, so I've updated this issue to include a task in this issue for adding flags to start and export to choose which JSON format to use.

Also I've thought about how to approach streaming JSON based on some things @alpe has shared and just want to document the approach I have in mind. (I don't think we need to get to this quite yet.)

Here are the constraints I see:

  • streaming JSON should be opt-in on a module by module basis and isn't blocking for v0.39 (unless there are resource issues doing the hub export)
  • looking at our current genesis code, the order of operations is pretty important
  • in order to support streaming properly, we need to be able to choose which elements in an object get parsed first

I think we can solve this with the following approach:

  • allow either a genesis.json file as now, or a genesis/ folder
  • arrays which are large and intended to be streamed can like in their own JSON files, i.e. staking.delegations could live in genesis/staking/delegations.json
  • root.json in the genesis/ folder contains all definitions that are not in the separate files. Anything which is in a separate file should be omitted from root.json

The API for this might look something like:

type ObjectReader interface {
    // reads the named field from the object into the ptr
    ReadObject(name string, ptr interface{}) error
    // opens a stream reader for the array at the named field
    // this will open a separate file from the disk if it exists to support massive arrays
    // (i.e. `delegations.json` for the field `delegations`)
    StreamArray(name string) ArrayStreamer
}

type ArrayStreamer interface {
    ReadNext(ptr interface{}) error
    HaveMore() bool
}

Using gogo jsonpb, there is already an UnmarshalNext method we can leverage: https://godoc.org/github.com/gogo/protobuf/jsonpb#Unmarshaler.UnmarshalNext

Can you share some details/link about this?

we are discussing and started a small doc here: https://docs.google.com/document/d/1RzF9qvr-a6oYbmKdhHJ_lYLIUbxozTFfsJCwJzAqXkY/edit#heading=h.n4jqb5mvv9mq

and have a issue here: https://github.com/tendermint/tendermint/issues/4828

cc @aaronc

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

should we get started with the .proto files? or is it blocked by something else?

There is still a question in my mind of whether we need streaming genesis for stargate. For ibc it鈥檚 likely not needed yet (because there isn鈥檛 a huge amount of live data on the hub). So migrating the current GenesisState to proto files is probably okay. However, the streaming genesis is l believe the future proof way to go for large import/exports. This is likely something we should discuss in our SDK call Friday.

@anilCSE and @sahith-narahari are going to export data from Cosmos Hub 3 so that we have a sense of how large the genesis import/export is. Then we can assess the best path forward.

What are the performance benefits from streaming JSON?

We've explored storing genesis state as a binary (we have 1.7M accounts and state is getting bloated) and only present genesis.json for readability purposes.

Terra will release cosmwasm before Stargate, so the state size is going much lager. In this aspect, we need streaming feature to avoid out of memory situation.

As @dokwon mentioned, It can be level db store like binary format.

@anilCSE and @sahith-narahari are going to export data from Cosmos Hub 3 so that we have a sense of how large the genesis import/export is. Then we can assess the best path forward.

Hub's exported state is ~52M in size, we might not need streaming genesis for stargate. We can address that later may be.

Thanks @anilCSE.

So sounds like it's more of an issue for Terra possibly. @YunSuk-Yeo do you know the actual size of the exported Terra genesis.json file?

@aaronc Our exported genstate for Columbus-3 (December last year) was around 500M. Assuming filesize growth is linear to account number growth that would imply a Columbus-4 genstate fillesize of > 1GB. The first block took a good 2 minutes for the fastest validators to process, and a lot longer for most others.

While I understand that the Hub would not have similar issues, think the SDK development should reflect needs of client chains that see sizable use.

@dokwon @YunSuk-Yeo unfortunately we won't have the time to complete streaming genesis for the 0.40 release - there is just too much on our backlog. I have, however, created an issue specifically for this (#6936) and tagged it for the 0.41 release.

Was this page helpful?
0 / 5 - 0 ratings