Vyper: Meta Issue: Tracker for codebase cleanup

Created on 19 Feb 2019 · 32Comments · Source: vyperlang/vyper

This is a tracking issue that I'm using to dump my thoughts on things that can be cleaned up in this codebase.

General

[x] convert to circle-ci
[x] make codebase pep8 compliant
[x] add mypy
[x] use tox
[ ] use enum instead of strings for typ values.
[ ] use docstrings instead of comments for class definitions in vyper.types.types
[x] Use functools.wraps for decorators
[ ] Use proper semantically meaningful exception classes instead of the base Exception
[ ] Create Opcodes Enum for easier referencing opcodes.
[ ] Use of floating point values seems like a foot gun. Maybe use decimal.Decimal?
[ ] Use f-strings
[ ] Use of magic string in location == "memory" should be an Enum.
[ ] Rename sha3 to keccak across codebase.

Specifics

[ ] vyper.type.types.NodeType.__eq__

change NodeType.__eq__ to require subclasses to implement and remove eq hook.

[ ] vyper.types.types.parse_type

Takes multiple arguments like sigs, custom_units, custom_structs and
constants. Current read of the code suggests there are data structures which
have been defined in the code and thus, this gives the parser the ability to
properly parse the AST into these types. If so, I believe this should be
formalized into something akin to the WASM Store to wrap these data
structures up into a single wrapper for easier and less error prone
portability.

[ ] vyper.types.types.StructType

Takes a mapping of field_name -> field_type for members argument. This
relies on dictionary ordering (which is admittedly now reliable) but also is a
more difficuult data structure to reliably type hint. Consider using an
iterable of 2-tuples: (('field_a', TypeA), ('field_b', TypeB)). Benefits are
easier type safety through type hinting. Cons are additional validation step
to ensure field uniqueness.

[ ] vyper.types.types.get_size_of_type

Relies on prior knowledge about types. This should probably be an API of the
type classes.

[ ] vyper.types.types.has_dynamic_data

Relies on prior knowledge about types to determine. This should probably just
be an API of the type classes.

[ ] vyper.functions.signature.signature

Uses non-canonical argz and kwargz with z suffixes. Convert to normal *args, **kwargs

Inner decorator function isn't using functools.wraps.

More usage of magic strings where an Enum is probably better.

[ ] vyper.opcodes

Replace use of list with NamedTuple.

[ ] vyper.compiler.gas_estimate

Uses runtime assert statement which is easily disabled. Change to explicit exception raising.

[x] vyper.compiler.mk_full_signature

Loop uses enumerate to generate indices idx used to modify abi at that
indices. Instead, use zip to pair functions with their abi definitions.

[ ] vyper.compiler.get_source_map

Builds dict using mutation. Vendor a version of to_dict decorator to remove mutation pattern.

[x] vyper.compiler.compile_codes

Uses mutable data structure as default for argument

Use of magic strings for output_type should be constrained to enums.

[x] vyper.compiler.compile_code

Uses mutable data structure as default for argument

[ ] vyper.compiler.get_asm

constant string mutation (which actual ends up as full object copies) adds
overhead. Converting to a wrapped generator pattern that joins all of the
produced strings at the end should be more performant and remove mutation.

[ ] vyper.compile_lll

Use of next_symbol global counter appears to be used to create a unique
marker in the code such as _sym_0, _sym_1. Replace with
itertools.count(). Replace with uuid.uuid4(). Replace with generator
function that wraps an itertools.count() style counter. Embed within an
object with a defined lifecycle that gets passed around during compilation.

[ ] vyper.compile_lll.is_symbol

Replace use of magic string with something more concrete like a custom class.
Potentially the custom class could handle the mechanism for generating unique
symbols.

[ ] vyper.compile_lll.instruction

instruction is a class and thus should be named using common convention
Instruction.

[x] vyper.compile_lll.apply_line_numbers

Needs to use functools.wraps for inner function.

[ ] vyper.compile_lll.compile_to_assembly

Convert runtime assert statements to raise proper exceptions.

[ ] vyper.compile_lll.compile_to_assembly

Detection of whether something is or is not an opcode is done via
isinstance(value, str) and value in opcodes. If we had an Opcodes Enum
this could be as simple as checking if Opcodes(value) raised an exception or
not.

[ ] vyper.compile_lll.compile_to_assembly

Use of inlined opcode string values like "PUSH" would be better as Enum or constant values.

[ ] vyper.compile_lll.compile_to_assembly

Individual blocks in the if/elif/.../else block could be partitioned off into smaller standalone functions.

[ ] vyper.compile_lll.compile_to_assembly

The section that handles clamping values has an if/elif which is missing a
final else clause allowing for undefined behavior if none of the conditions
evaluates truthy

[ ] vyper.compile_lll.assembly_to_evm

Use formal data structure for line_number_map instead of dictionary.

[ ] vyper.compile_lll.assembly_to_evm

Inside of the if isinstance(item, list), the local value line_number_map is
overwritten which appears to be a potentially destructive mistake as it will
destroy any state that was previously built up locally.

[ ] vyper.optimizer.get_int_at

Uses inline logic for converting a value to a signed integer. Should be extracted into utility that is tested.

[ ] vyper.util.is_instances

Confusingly named. Very visually similar to isinstance. Maybe rename to a all_is_instances

[ ] vyper.parser.constants.Constants.unroll_constant

Use of fail flag is antipattern. Restructure to allow inlined exception raising.

[x] vyper.parser.context.Context.start_blockscope and others

The various function pairs like start_blockscope/end_blockscope,
set_in_assertion, set_in_for_loop/remove_in_for_loop should be converted to
use context manager pattern so that exiting the scope cannot be accidentally
forgotten.

[ ] vyper.parser.global_ctx.GlobalContext.get_item_name_and_attributes

Returns None for item name in final block which suggests there is either a
case where something does not have a name or that there is return value
checking occuring somewhere.

[ ] vyper.parser.stmt.Statements._check_valid_assign

Does not have a final else block allowing function to exit if none of the conditions evaluate truthy.

[ ] vyper.parser.stmt.Statements.parse_for

Uses implicit else for final clause type. Should be converted to explicitely
check that the statement is formed as expected and add a final else block
that raises an invariant exception.

Source

pipermerriam

🎉3

Most helpful comment

Branch can be deleted and this can be closed. If I find myself back here working on stuff I'll re-open or make a new issue.

pipermerriam on 24 Jul 2019

❤1 👍1

All 32 comments

Look into enabling doctest

pipermerriam on 19 Feb 2019

Hey @pipermerriam ,

I did a quick scan - looks good. Here are some quick additional notes:

convert to circle-ci

tox

These would be great travis is super slow (and I don't have access).

Create Opcodes Enum for easier referencing opcodes.

Would require all LLL statmements tho use the Opcodes Enum or have to have lookup method attached to it (all LLL function values are currently strings). Writing with strings make it easier to read / parse mentally - so I vote we keep strings in the LLL.

Takes multiple arguments like sigs, custom_units, custom_structs and
constants. Current read of the code suggests there are data structures which
have been defined in the code and thus, this gives the parser the ability to
properly parse the AST into these types. If so, I believe this should be
formalized into something akin to the WASM Store to wrap these data
structures up into a single wrapper for easier and less error prone
portability.

Alot of those have been refactored to be contained on GlobalContext, and mostly can be refactored to pass global_ctx through instead, so check for that.
However there are some base layer functions that will always require one to not pass them through, for example in some scenarios (building function signatures for interfaces don't have custom_units) you don't have global context populated, and in other cases you do.

Additionally, a lot of code in parser.py and parser_utils.py can be split of into sub modules (this has been my goal since I have started and have been trying to do so when possible). parser.py should contain parsing steps of basically Module / Function & Context.

vyper.types.types.get_size_of_type

Relies on prior knowledge about types. This should probably be an API of the
type classes.

vyper.types.types.has_dynamic_data
Relies on prior knowledge about types to determine. This should probably just
be an API of the type classes.

What is meant by API of type classes, could you elaborate? I think having something like BaseType.get_size(unit='bytes'|''bits') could work well.

Additional items you can add:

location should use Enum too currently uses magic strings as well.
_Context_ should probably renamed to LocalContext to be more explicit.
GlobalContext items on Context should be removed, and ctx._global_ctx should become ctx.global_ctx instead.

jacqueswww on 20 Feb 2019

@pipermerriam I am currently working on a PR then I will be helping with these issues, since they potentially touch a lot of code. I'd prefer we get it all done as quickly as possible, to avoid any big branch diverges :smile:

jacqueswww on 20 Feb 2019

@jacqueswww lets deal with that kind of thing on a case-by-case basis. Any of these that I deal will be done in an isolated way that touches the smallest possible section of code so if they do cause any conflicts I believe they should be reasonably easy to resolve and I'm happy to either work with you to rebase your branches or to rebase them for you in the event something messy occurs.

pipermerriam on 20 Feb 2019

@pipermerriam works for me :+1: let's see how it goes, I checked the open PRs now, and I think we'll just deal with them as they come along.

jacqueswww on 20 Feb 2019

replace much of what is in vyper.utils with eth-utils utility functions.
rename sha3 to keccak across codebase.

pipermerriam on 20 Feb 2019

replace much of what is in vyper.utils with eth-utils utility functions.

I'd prefer not to pull eth-utils for 1 function, as eth-utils pulls in 4 other dependencies of only one traces back to a crypto lib for a keccak implementaiton.

jacqueswww on 20 Feb 2019

@jacqueswww Minor nitpicky correction that it's 3 dependencies since cytoolz/toolz are functionally equivalent the way they are specified.

The eth-utils dependencies are:

eth-hash
- thin wrapper around what you are already doing with pycryptodome locally and eth-utils contains a keccak function which is a rough equivalent to the existing vyper.utils.sha3 helper.
eth-typing
- Literally only contains common type hinting primatives, many of which would get duplicated in this repo at the point where type hints are added (assuming type hints get added).
toolz/cytoolz
- Very stable functional utility libraries that we use across the board across pretty much every python codebase.

Can you elaborate on what issues or problems you see with any of these or maybe expand on your reasoning for not wanting these to end up as dependencies.

pipermerriam on 21 Feb 2019

@pipermerriam The first reason that would be that I tend to keep to stdlib as close as possible, more of a philosophical thing. It isn't that above libraries are not stable - but more about importing unnecessary dependencies (think _left-pad_ saga).

The more pragmatic reasons are:

Vyper has to be packaged in many different ways. On of which is browser js/wasm blob, and there is additionally maintenance in maintaining this build (and others) with additional dependencies.
The best solution to conflict dependency hell is not having dependencies. This is especially true if Vyper is to be included into other code bases, currently it's very hard to have Vyper conflict. Perfect example is someone using twig/populus with a specific versions.
In the past research team actually had direct conflicts with py-ethereum #735. py-evm is off course much better split up - but I easily see conflicts still happening (and has happened to myself a couple of times e.g. vypers' eth-typing conflicts with the one I have setup for my current test suite).

jacqueswww on 21 Feb 2019

To add another point, we have a requirement to allow users to use the compiler with any version we have uploaded, either through a redeployment (docker, snapcraft, browser blob) or through version pining the library itself. This is so someone could test with any version of the compiler without running into dependency issues so they can validate code written with different versions.

Something to think about.

fubuloubu on 21 Feb 2019

Here's my best stab at convincing you that the policy on dependencies should be more permissible, though I do fully agree that new dependencies should be evaluated thoroughly. This isn't an argument to say that eth-utils should be allowed (we can debate that one separately), but an argument that we should not have a policy of no dependencies.

The best solution to conflict dependency hell is not having dependencies.

This feels a little like sticking our heads in the sand. We can have dependencies without having dependency hell. I don't think this quite qualifies as a false dichotomy but I disagree that not having dependencies is the "best solution". Even Django has dependencies these days. So does requests.

Choosing to not have dependencies also means choosing to duplicate functionality that exists in other libraries where it is well tested, documented, and most importantly maintained as public APIs. The end result will be vyper accruing more and more vendored utility code, all of which will receive less maintenance, attention, tests, and bugfixes than it's equivalent from the libraries exposing it as public APIs. I'll take the occasional dependency issue over this maintenance nightmare any day. It's worth noting that eth-utils exists for exactly this reason. Most everything in that library is there because it existed in multiple codebases and ended up suffering from things forgetting to apply bug fixes from one repository to all of the others.

I am absolutely on board with being very conservative with the dependencies we add, but I think a blanket policy is a bad idea.

In the past research team actually had direct conflicts with py-ethereum

This seems to be roughly the same as the previous point, but to address it directly, we aren't in the pyethereum days and haven't been for a long time. The current suite of python+ethereum tools is maintained and supported by a team of 8 full time developers and an even larger group of active third party contributors. I get that there is still scar tissue from the old days, but times have changed.

As for the issue about needing to vyper to be packaged in different ways, I'd like to objectively determine if adding these packages actually reduces our options there. Who did the js/wasm blob creation? Lets find out how we can have our cake and eat it too, since I suspect that we can have both (js/wasm portability and dependencies).

pipermerriam on 21 Feb 2019

The requirement @fubuloubu mentions is important, but I also don't believe it's an argument to not have any dependencies, but rather one more thing to think about with respect to what we add and how we specify them. I am confident we can approach this in a way that allows us some freedom of movement when it comes to adding a dependency that has been thoroughly reviewed and evaluated to fit whatever our necessary criteria are for such things.

pipermerriam on 21 Feb 2019

👍1

@pipermerriam I don't think that argument is that we shouldn't have dependencies. But that we shouldn't have dependencies that a.) will cause users of Vyper pip hell b.) pull in needless dependencies that we don't need, and that's why I am not keen on eth-utils. eth-typing will cause trouble in the future because every single py-evm related package will depened on it. I would more than willing to go with eth-utils if it had no dependencies itself.

TBH I don't understand why we have to be DRY about strict typing structs (eth-typing). What is wrong with having a Union or type alias in the codebase that uses them. mypy is basically a fancy linter and should be part of the test requirements if really necessary. It just feels to me that to have a linter dictate the dependency graph of shipping code, is going to hurt. Basically I am of the opion that a linting step should not dictate dependency graphs. Replace mypy with flake8, and you'd be super confused why you can't install vyper because of flake8 (not talking tests, just the compiler), which will happen especially if you have a lot of developers working on the separate repos.

I do the wasm builds, and the currently build requires us to basically export the setup requirements and make sure they can build themselves, mostly it's not a problem but anything that does C compilation can be tricky, but all manageable - and if it becomes popular some better scripts can happen here. https://github.com/jacqueswww/vyper-in-browser/tree/master/pyodide

Edit: Ha, I see this uses pysha3 because pycryptodome didn't compile. This perfect example of why having a very strict dependency policy saves you.

Edit2: So from my side I am on board with pulling in dependencies, but they have to be very strictly evaluated for the value they bring, for this specific case I really don't feel the upside merits the extra dependency weight. Me raising all of the above is how one has to be, to be conservative about dependencies - this comes from my personal experience (not only python specific, do you remember yum hell?). The cost of adding a dependency into your requirements / setup.py is unfairly low - to the potential downside it can cost in the long run.

jacqueswww on 21 Feb 2019

Off dependency topic: I really like the suggested use case for ContextManager, and I will do PR for them soon :smile_cat:

jacqueswww on 21 Feb 2019

👍1

A potential argument for eth-typing: we should really type annotate our codebase. Will we require eth types to do so?

fubuloubu on 21 Feb 2019

@fubuloubu I would argue not, everything in eth-typing can be defined on their own terms, eth-typing is a bunch of aliases. The dependency problem with it all is that you do an _import_ statement to import the types - thereby forever binding your code to eth-typing. This also means you have to know what is defined in the installed package to know what the alias means (for folks who don't have fancy IDEs). I am much much more in favour of each repository like vyper use it's own defined type names, and even better to strive for standard annotations as much as possible.

jacqueswww on 21 Feb 2019

Everything in pycryptodome could be defined on its own too, it's not a complicated library and really we're only using 2-ish functions from it. Should we eschew all dependencies and start maintaining all functions internally?

fubuloubu on 21 Feb 2019

@fubuloubu the difference is we actually use a function there.
before:

vyper ->keccack lib

after:
vyper -> eth_utils -> eth_hash -> keccack lib -> ctoolz/toolz etc. (still not sure why there are 2 equivalent libraries here?) -> eth typing

jacqueswww on 21 Feb 2019

eth-typingis by no means required for type hinting. It's just a collection of common aliases. I'm fine leaving it out.

eth-utils is by no means required and I'm more than fine tabling it.

As of now, I don't have any dependencies that I'm going to make a case for adding and I'm glad that we appear to be on roughly the same page and I'm fine being extra conservative about adding dependencies.

pipermerriam on 21 Feb 2019

👍2

@jacqueswww the toolz/cytoolz auto negotiates install based on the base python architecture since cytoolz doesn't play nice with pypy but is significantly faster than toolz when we're in an environment that supports c-extensions.

pipermerriam on 21 Feb 2019

👍2

@pipermerriam ah I see! it's so it's so that py-evm can run on pypy make sense (and speed boost) :)
Would you perhaps review my PR, https://github.com/ethereum/vyper/pull/1269, I am a bit new to context managers and I'd appreciate your feedback (when you have a chance).

jacqueswww on 21 Feb 2019

@pipermerriam also is it ok if I put tickboxes on the above ticket so we can keep track what has been done?

jacqueswww on 21 Feb 2019

👍1

consider converting inline vyper source files in tests to be standalone vyper files: tests/parser/syntax/test_no_none.py

pipermerriam on 21 Feb 2019

👀1

@pipermerriam that may be a tall order, we have a bunch of programs, both for functional testing and for compilation checks. I am open to this though, would just require some special setup folder or something.

Our larger examples are standalone files.

fubuloubu on 22 Feb 2019

[ ] convert print(..) statements in tests to use logging so that output can be throttled and controlled using standard pytest CLI apis.

pipermerriam on 22 Feb 2019

👍1

[ ] replace the assert_compile_failed with standard with pytest.raises(...) style failure checks.

pipermerriam on 22 Feb 2019

👍1

[ ] convert print(..) statements in tests to use logging so that output can be throttled and controlled using standard pytest CLI apis.

I think we should just remove those? I have never used them, and the test file name / test structure should inform the user of what's being testing in a specific file, or alternatively use a comment - don't see why using stdout doesn't provide the same but w/o print statements. I think it was maybe when all tests was in one big file?

So I'd just move those to top level comments, as they are still useful in terms of knowing more about what is being tested.

jacqueswww on 27 Feb 2019

👍1

Continuing off https://github.com/ethereum/vyper/pull/1351#discussion_r266033074 I think we should a) have a CompilerPanic exception to replace asserts which suggests to the user to file a bug report, and b) catch all exceptions that aren't Vyper exceptions (e.g. ast.NameError, ValueError, etc.) at the top level and turn them into CompilerPanic. This is to help the user experience.

charles-cooper on 15 Mar 2019

🚀1 👍1

Idea for doctest:

Create a few helper templates that encapsulate example Vyper code in a vyper contract and then compiles it.
Add some testing scaffolding that we can show the functionality of the code with in-doc examples.

fubuloubu on 15 Mar 2019

@charles-cooper I will include this in the square root PR, since the PR is turning into a meta PR anyways.

jacqueswww on 17 Mar 2019

@pipermerriam you have this stale branch, which I think relates to this issue. Can we delete that branch (issue is still valid)?

Also would like to walk through this issue and see if anything is still relevant to be cleaned up.

fubuloubu on 24 Jul 2019

Branch can be deleted and this can be closed. If I find myself back here working on stuff I'll re-open or make a new issue.

pipermerriam on 24 Jul 2019

❤1 👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Unhandled error in my code

domrany64 · 3Comments

Literal division returns wrong result

ben-kaufman · 3Comments

Monday 17th December 2018

jacqueswww · 3Comments

Installation error on Arch (eth-pm dependency)

travs · 3Comments

Bytes doesn't accept 0x... literals

ben-kaufman · 4Comments