This issue is set up as a starting point to address inconveniences clients have experienced with the test distribution and/or format.
Main pain points identified so far:
1) LFS deals with the large files
1.1) not that nice with CI, new dependency
1.2) clones without LFS active are troublesome
1.3) requires authentication.
- Shouldn't happen, but relatively new issue. May be forced due to rate limits / settings.
2) The sizes of the files are very large
2.1) Need to read header to filter for tests, do not want to read the full thing in memory
2.2) Cannot load the full suite of tests in memory
3) I need format X, because feature Y of the current format is a bad experience for me.
4) Configuration is missing, do not like it in the specs repo.
du -ah tests | grep -v "/$" | sort -rh
2.8G tests
1.4G tests/operations
654M tests/epoch_processing
576M tests/sanity
420M tests/sanity/blocks
419M tests/sanity/blocks/sanity_blocks_mainnet.yaml
374M tests/operations/attestation
372M tests/operations/attestation/attestation_mainnet.yaml
327M tests/operations/attester_slashing
326M tests/operations/attester_slashing/attester_slashing_mainnet.yaml
250M tests/epoch_processing/justification_and_finalization
249M tests/operations/deposit
249M tests/epoch_processing/justification_and_finalization/justification_and_finalization_mainnet.yaml
248M tests/operations/deposit/deposit_mainnet.yaml
185M tests/ssz_static/core
185M tests/ssz_static
171M tests/operations/proposer_slashing/proposer_slashing_mainnet.yaml
171M tests/operations/proposer_slashing
158M tests/ssz_static/core/ssz_mainnet_random.yaml
156M tests/sanity/slots
156M tests/operations/voluntary_exit
155M tests/sanity/slots/sanity_slots_mainnet.yaml
155M tests/operations/voluntary_exit/voluntary_exit_mainnet.yaml
125M tests/epoch_processing/final_updates/final_updates_mainnet.yaml
125M tests/epoch_processing/final_updates
94M tests/operations/block_header
94M tests/epoch_processing/slashings
94M tests/epoch_processing/registry_updates
94M tests/epoch_processing/crosslinks
93M tests/operations/block_header/block_header_mainnet.yaml
93M tests/epoch_processing/slashings/slashings_mainnet.yaml
93M tests/epoch_processing/registry_updates/registry_updates_mainnet.yaml
93M tests/epoch_processing/crosslinks/crosslinks_mainnet.yaml
9.9M tests/ssz_static/core/ssz_minimal_lengthy.yaml
6.6M tests/ssz_static/core/ssz_minimal_random.yaml
6.3M tests/ssz_static/core/ssz_minimal_random_chaos.yaml
3.8M tests/ssz_static/core/ssz_minimal_one.yaml
1.9M tests/sanity/blocks/sanity_blocks_minimal.yaml
1.7M tests/operations/transfer/transfer_minimal.yaml
1.7M tests/operations/transfer
1.5M tests/operations/attestation/attestation_minimal.yaml
1.2M tests/operations/attester_slashing/attester_slashing_minimal.yaml
968K tests/epoch_processing/justification_and_finalization/justification_and_finalization_minimal.yaml
896K tests/operations/deposit/deposit_minimal.yaml
608K tests/operations/proposer_slashing/proposer_slashing_minimal.yaml
548K tests/operations/voluntary_exit/voluntary_exit_minimal.yaml
544K tests/sanity/slots/sanity_slots_minimal.yaml
520K tests/genesis
444K tests/epoch_processing/final_updates/final_updates_minimal.yaml
444K tests/epoch_processing/crosslinks/crosslinks_minimal.yaml
400K tests/shuffling
396K tests/shuffling/core
332K tests/operations/block_header/block_header_minimal.yaml
328K tests/epoch_processing/registry_updates/registry_updates_minimal.yaml
324K tests/epoch_processing/slashings/slashings_minimal.yaml
276K tests/genesis/validity
272K tests/genesis/validity/genesis_validity_minimal.yaml
240K tests/genesis/initialization
236K tests/genesis/initialization/genesis_initialization_minimal.yaml
212K tests/ssz_static/core/ssz_minimal_zero.yaml
204K tests/ssz_static/core/ssz_minimal_max.yaml
200K tests/ssz_static/core/ssz_minimal_nil.yaml
196K tests/shuffling/core/shuffling_minimal.yaml
196K tests/shuffling/core/shuffling_full.yaml
92K tests/bls
28K tests/ssz_generic
24K tests/ssz_generic/uint
24K tests/bls/sign_msg
20K tests/bls/sign_msg/sign_msg.yaml
20K tests/bls/aggregate_sigs
16K tests/bls/msg_hash_g2_uncompressed
16K tests/bls/aggregate_sigs/aggregate_sigs.yaml
12K tests/bls/msg_hash_g2_uncompressed/g2_uncompressed.yaml
12K tests/bls/msg_hash_g2_compressed
8.0K tests/ssz_generic/uint/uint_wrong_length.yaml
8.0K tests/ssz_generic/uint/uint_random.yaml
8.0K tests/bls/priv_to_pub
8.0K tests/bls/msg_hash_g2_compressed/g2_compressed.yaml
8.0K tests/bls/aggregate_pubkeys
4.0K tests/ssz_generic/uint/uint_bounds.yaml
4.0K tests/bls/priv_to_pub/priv_to_pub.yaml
4.0K tests/bls/aggregate_pubkeys/aggregate_pubkeys.yaml
As you can see: mainnet files are the biggest source of trouble. Otherwise the maximum individual file size would be 10 MB (SSZ). Or just 2 MB for a single state transition suite. Compare this to the 419 MB block processing test suite for mainnet.
These are the current solutions, not pretty, but functional:
1) LFS deals with the large files
1.1) assumed to run your own docker image in CI already, should be easy to add to the image (even available to Alpine linux)
1.2) Non-LFS clones are just exactly that, we cannot have these large files in the normal Git system, some of these files are just too large to even consider diffing it.
1.3) Authentication shouldn't be required for a public repo. It worked in a CI test setting before. But this is a relatively new issue. And it may be forced due to rate limits / settings. Use the gzipped tar in CI for now instead.
2) The sizes of the files are very large
2.1) The first X lines can be read, cut at the test_cases: line, and parsed. Not pretty, but the alternative of duplicate data / separate headers is not either. If the files were small, it wouldn't be an issue. (Legacy of early choice for yaml)
2.2) Loading it fully in memory before processing is bad, even if parsed into states, the mainnet state objects are still big, too much to keep a few hundred of them in memory. Also consider that there are pre and post states. Too much data to deal with at once really.
3) Similar to cross client communication, a format that works for everyone is difficult. Status quo is to keep it:
- Simple to implement
- As readable as possible
- Generic enough to deal with it in some way or another, even if not likeable.
4) Configuration is always an issue:
- Many teams / languages
- Constants in the spec that may not override as easily.
- Since there are not nearly as many spec changes anymore after freeze, keeping up manually is effectively more efficient, although "dumb" work.
- A client needs to deal with their language choice + other configuration anyway
- Stronger enforcement of the config in practice may not be worth it currently because of it. It will break the workflow of some teams for sure.
- Loading a yaml from the specs repo, and checking automatically if it matches the client config, may be a good temporary solution.
- Or a script to convert the yaml file in whatever format is preferred.
List of ideas, in no particular order, to think about:
Please answer in a DM on discord or telegram:
And then I will publicly share anonymized aggregated findings (time TBD). And hopefully find some better solution than status quo.
Please consider answering the following questions (answers may be brief/long):
tests/ in release instead of LFS clone?Testing workload and format is a lot to deal with, sharing thoughts + taking survey to make some progress.
Surveyed most teams, thank you for the quick and extensive responses!
Split tests into a deeply structured file tree, to filter without memory overhead, at the cost of a bit of disk reading.
File path structure:
tests/<config name>/<fork or phase name>/<test runner name>/<test handler name>/<test suite name>/<test case>/<output part>
<config name>/ -- Configs are upper level. Some clients want to run minimal first, and useful for sanity checks during development too.
As a top level dir, it is not duplicated, and the used config can be copied right into this directory as reference.
<fork or phase name>/ -- This would be: "phase0", "transferparty", "phase1", etc. Each introduces new tests, but does not copy tests that do not change.
If you like to test phase 1, you run phase 0 tests, with the configuration that includes phase 1 changes. Out of scope for now however.
<test runner name>/ -- The well known bls/shuffling/ssz_static/operations/epoch_processing/etc. Handlers can change the format, but there is a general target to test.
<test handler name>/ -- Specialization within category. All suites in here will have the same test case format.
<test suite name>/ -- Suites are split up. Suite size does not change memory bounds, and makes lookups of particular tests fast to find and load.
<test case>/ -- Cases are split up too. This enables diffing of parts of the test case, tracking changes per part, while still using LFS. Also enables different formats for some parts.
<output part> -- E.g. "pre.yaml", "deposit.yaml", "post.yaml".
- Diffing a pre.yaml and post.yaml gives you all ther information for testing, good for readability of the change. Then compare the diff to anything that changes the pre state, e.g. "deposit.yaml"
- Allows for custom format for some parts of the test. E.g. something encoded in SSZ.
- "pre.ssz", "deposit.ssz", "post.ssz" etc. is the next step: place a copy, but in binary format, right next to legacy yaml.
Clients can then shift to ssz inputs for efficiency, while we implement a SSZ viewer.
And when that alleviates the readability concern, we can drop the yaml files for state encoding.
This also means that some clients can drop their YAML -> JSON/other -> SSZ work-arounds that had
to be implemented to support the uint64 YAML, hex, etc. that is not idiomatic to their language.
- We keep yaml for metadata, and non-SSZ things. (E.g. shuffling and BLS tests)
The test case formats itself do not change, the properties are just loaded from multiple files, instead of sub-properties of one file.
For the better, it reduces memory requirements, and makes test case filtering much better.
I support the LevelDB idea too, but versioning is important, and we do not get that with leveldb.
Instead, I recommend clients to use the gzipped tarball to read tests from in their CI setting, or write their own tooling to push the file structure into their own leveldb.
Also, please cache your tarball, it helps performance, and saves us all good amounts of bandwidth (may not be prohibitive costs though).
And instead of zipping the tests/ dir, I think we can zip the individual <config name> directories, so people can keep them separate easily if they want to.
bls_required/bls_ignored file (alike to .gitkeep markers), a simple file system check works: if bls_required.exists(): enable_bls() elif bls_ignored.exists(): disable_bls() else: default_bls_preference()post.hash file, for optimistic equality checking of the resulting post state. meta.yaml, which lists such properties.bls as file level, but the extra level for each such property is a bit much, and requires a filesystem check just like the empty file marker. And it does not work for data such as the post state root.ls or tree call works just as well to generate your own index.I have a POC based on the SSZ collapsible tree-view I implemented for https://simpleserialize.com (awesome site by Chainsafe, check it out).
The basic proposed functionality is:
Browse to somedomain.io/v0.8.1/tests/phase0/operations/deposits/common/success_top_up/ and get 3 tree views next to each other, annotated with SSZ types: pre, post and the deposit.
However, their JS types are not fully usable for 0.8.1 yet, so it cannot load the ssz types that changed yet.
Diffing the pre.yaml and post.yaml when ingesting pre.ssz and post.ssz should work well enough for debugging for now though.
Also, there is too much tooling to implement, so I will prioritize testing and the testnetwork live verification tool,
over deprecating the yaml. But maybe others like to help with the test viewer? (Chainsafe? @GregTheGreek ?)
Understand the concern of not changing the config. What I do want to standardize, is how we handle forks in configs.
Overwriting "constants" is very impractical, and confusing, and requires configs to live next to each other as full copies during forks.
Instead, not many constants change anyway, and forks are considered to be backwards compatible in some basic form (we still need to sync old data, with old code or not).
Also, we would like to add new fork constants in advance, to test functionality in test settings, without configuration overhead.
I propose to (minimally) prefix constants that changed in a fork, and prepare prefixes for constants when we know they are going to change.
Long term, we can rotate forks out when we do not need to sync it anymore (deprecating it essentially), and remove the prefixes from constants that are considered stable.
Prepared change example: P0_MAX_TRANSFERS = 0 (P0 = phase 0), then fork to MAX_TRANSFERS = 16.
Rotation example: PROPOSER_REWARD_QUOTIENT = 8, then fork to XY_PROPOSER_REWARD_QUOTIENT = 16. Then deprecate the old constant as P0_PROPOSER_REWARD_QUOTIENT, and start calling XY_PROPOSER_REWARD_QUOTIENT just PROPOSER_REWARD_QUOTIENT.
Pro: single config, easy config management (no forks), and code can do the switching however it likes.
Con: prefixes (although clearer) are less pretty.
Simple decision, and gets us to transfer testing without complicated configuration or management changes. Sounds good?
For forks, the timeline idea still holds: simple key-value mapping to declare the slots for fork names. We can consider other fork activation later, when we have a good example.
Deeper test structure for lower constraints and better filtering. Super small config change to enable forks (and transfer tests without hacks).
I support the LevelDB idea too, but versioning is important, and we do not get that with leveldb.
You could commit leveldb files to git?
You could commit leveldb files to git?
@mpetrunic We could, but part of the versioning reasoning is to see which tests are new and/or have been changed. And to roll back a test easily if necessary (as client, or as test maintainer). With raw leveldb chunks you do not get that. As an implementer, you can always write a little script to put the files in leveldb, and use the filepath as key. If that works better for you, please go ahead.
Also, there is too much tooling to implement, so I will prioritize testing and the testnetwork live verification tool,
over deprecating the yaml. But maybe others like to help with the test viewer? (Chainsafe? @GregTheGreek ?)
Let me know what you need
@GregTheGreek Awesome, thanks. Will implement first things first though (the structure change in test generation). But yes, could use some help in:
If I get to implement the testing change this week, we could get this viewer going some time next week maybe. Let's discuss in discord chat some time.
This is the look of the current POC I implemented yesterday (beacon state type not updated, so using the change thing 3x to get a feel for layout):

It loads yaml data from a public (CORS enabled) github media endpoint. But could switch it to fetching ssz files, and parsing them with ssz-js.
I could put the otherwise static site in a simple firebase hosting wrapper, to deal with routing of dynamic urls. And then we could link to versioned tests by url using the viewing site, instead of the raw ssz on github.
Closing 馃憤

Most helpful comment
Survey results:
Surveyed most teams, thank you for the quick and extensive responses!
But others prefer to stick to yaml for readability and/or "it works". Readability is less of a concern if there was a SSZ test viewer.
Plan:
Split tests into a deeply structured file tree, to filter without memory overhead, at the cost of a bit of disk reading.
The test case formats itself do not change, the properties are just loaded from multiple files, instead of sub-properties of one file.
For the better, it reduces memory requirements, and makes test case filtering much better.
I support the LevelDB idea too, but versioning is important, and we do not get that with leveldb.
Instead, I recommend clients to use the gzipped tarball to read tests from in their CI setting, or write their own tooling to push the file structure into their own leveldb.
Also, please cache your tarball, it helps performance, and saves us all good amounts of bandwidth (may not be prohibitive costs though).
And instead of zipping the
tests/dir, I think we can zip the individual<config name>directories, so people can keep them separate easily if they want to.Open questions:
bls_required/bls_ignoredfile (alike to.gitkeepmarkers), a simple file system check works:if bls_required.exists(): enable_bls() elif bls_ignored.exists(): disable_bls() else: default_bls_preference()post.hashfile, for optimistic equality checking of the resulting post state.meta.yaml, which lists such properties.blsas file level, but the extra level for each such property is a bit much, and requires a filesystem check just like the empty file marker. And it does not work for data such as the post state root.On the other hand, duplication is error prone too, and a recursive
lsortreecall works just as well to generate your own index.test-viewer
I have a POC based on the SSZ collapsible tree-view I implemented for https://simpleserialize.com (awesome site by Chainsafe, check it out).
The basic proposed functionality is:
Browse to
somedomain.io/v0.8.1/tests/phase0/operations/deposits/common/success_top_up/and get 3 tree views next to each other, annotated with SSZ types: pre, post and the deposit.However, their JS types are not fully usable for 0.8.1 yet, so it cannot load the ssz types that changed yet.
Diffing the
pre.yamlandpost.yamlwhen ingestingpre.sszandpost.sszshould work well enough for debugging for now though.Also, there is too much tooling to implement, so I will prioritize testing and the testnetwork live verification tool,
over deprecating the yaml. But maybe others like to help with the test viewer? (Chainsafe? @GregTheGreek ?)
Review of attention points
Config
Understand the concern of not changing the config. What I do want to standardize, is how we handle forks in configs.
Overwriting "constants" is very impractical, and confusing, and requires configs to live next to each other as full copies during forks.
Instead, not many constants change anyway, and forks are considered to be backwards compatible in some basic form (we still need to sync old data, with old code or not).
Also, we would like to add new fork constants in advance, to test functionality in test settings, without configuration overhead.
I propose to (minimally) prefix constants that changed in a fork, and prepare prefixes for constants when we know they are going to change.
Long term, we can rotate forks out when we do not need to sync it anymore (deprecating it essentially), and remove the prefixes from constants that are considered stable.
Prepared change example:
P0_MAX_TRANSFERS = 0(P0= phase 0), then fork toMAX_TRANSFERS = 16.Rotation example:
PROPOSER_REWARD_QUOTIENT = 8, then fork toXY_PROPOSER_REWARD_QUOTIENT = 16. Then deprecate the old constant asP0_PROPOSER_REWARD_QUOTIENT, and start callingXY_PROPOSER_REWARD_QUOTIENTjustPROPOSER_REWARD_QUOTIENT.Pro: single config, easy config management (no forks), and code can do the switching however it likes.
Con: prefixes (although clearer) are less pretty.
Simple decision, and gets us to transfer testing without complicated configuration or management changes. Sounds good?
For forks, the timeline idea still holds: simple key-value mapping to declare the slots for fork names. We can consider other fork activation later, when we have a good example.
TLDR
Deeper test structure for lower constraints and better filtering. Super small config change to enable forks (and transfer tests without hacks).