This is a tracking issue that I'm using to dump my thoughts on things that can be cleaned up in this codebase.
enum instead of strings for typ values.vyper.types.typesfunctools.wraps for decoratorsExceptionOpcodes Enum for easier referencing opcodes.decimal.Decimal?location == "memory" should be an Enum.sha3 to keccak across codebase.vyper.type.types.NodeType.__eq__change NodeType.__eq__ to require subclasses to implement and remove eq hook.
vyper.types.types.parse_typeTakes multiple arguments like sigs, custom_units, custom_structs and
constants. Current read of the code suggests there are data structures which
have been defined in the code and thus, this gives the parser the ability to
properly parse the AST into these types. If so, I believe this should be
formalized into something akin to the WASM Store to wrap these data
structures up into a single wrapper for easier and less error prone
portability.
vyper.types.types.StructTypeTakes a mapping of field_name -> field_type for members argument. This
relies on dictionary ordering (which is admittedly now reliable) but also is a
more difficuult data structure to reliably type hint. Consider using an
iterable of 2-tuples: (('field_a', TypeA), ('field_b', TypeB)). Benefits are
easier type safety through type hinting. Cons are additional validation step
to ensure field uniqueness.
vyper.types.types.get_size_of_typeRelies on prior knowledge about types. This should probably be an API of the
type classes.
vyper.types.types.has_dynamic_dataRelies on prior knowledge about types to determine. This should probably just
be an API of the type classes.
vyper.functions.signature.signatureUses non-canonical argz and kwargz with z suffixes. Convert to normal *args, **kwargs
Inner decorator function isn't using functools.wraps.
More usage of magic strings where an Enum is probably better.
vyper.opcodesReplace use of list with NamedTuple.
vyper.compiler.gas_estimateUses runtime assert statement which is easily disabled. Change to explicit exception raising.
vyper.compiler.mk_full_signatureLoop uses enumerate to generate indices idx used to modify abi at that
indices. Instead, use zip to pair functions with their abi definitions.
vyper.compiler.get_source_mapBuilds dict using mutation. Vendor a version of to_dict decorator to remove mutation pattern.
vyper.compiler.compile_codesUses mutable data structure as default for argument
Use of magic strings for output_type should be constrained to enums.
vyper.compiler.compile_codeUses mutable data structure as default for argument
vyper.compiler.get_asmconstant string mutation (which actual ends up as full object copies) adds
overhead. Converting to a wrapped generator pattern that joins all of the
produced strings at the end should be more performant and remove mutation.
vyper.compile_lllUse of next_symbol global counter appears to be used to create a unique
marker in the code such as _sym_0, _sym_1. Replace with
itertools.count(). Replace with uuid.uuid4(). Replace with generator
function that wraps an itertools.count() style counter. Embed within an
object with a defined lifecycle that gets passed around during compilation.
vyper.compile_lll.is_symbolReplace use of magic string with something more concrete like a custom class.
Potentially the custom class could handle the mechanism for generating unique
symbols.
vyper.compile_lll.instructioninstruction is a class and thus should be named using common convention
Instruction.
vyper.compile_lll.apply_line_numbersNeeds to use functools.wraps for inner function.
vyper.compile_lll.compile_to_assemblyConvert runtime assert statements to raise proper exceptions.
vyper.compile_lll.compile_to_assemblyDetection of whether something is or is not an opcode is done via
isinstance(value, str) and value in opcodes. If we had an Opcodes Enum
this could be as simple as checking if Opcodes(value) raised an exception or
not.
vyper.compile_lll.compile_to_assemblyUse of inlined opcode string values like "PUSH" would be better as Enum or constant values.
vyper.compile_lll.compile_to_assemblyIndividual blocks in the if/elif/.../else block could be partitioned off into smaller standalone functions.
vyper.compile_lll.compile_to_assemblyThe section that handles clamping values has an if/elif which is missing a
final else clause allowing for undefined behavior if none of the conditions
evaluates truthy
vyper.compile_lll.assembly_to_evmUse formal data structure for line_number_map instead of dictionary.
vyper.compile_lll.assembly_to_evmInside of the if isinstance(item, list), the local value line_number_map is
overwritten which appears to be a potentially destructive mistake as it will
destroy any state that was previously built up locally.
vyper.optimizer.get_int_atUses inline logic for converting a value to a signed integer. Should be extracted into utility that is tested.
vyper.util.is_instancesConfusingly named. Very visually similar to isinstance. Maybe rename to a all_is_instances
vyper.parser.constants.Constants.unroll_constantUse of fail flag is antipattern. Restructure to allow inlined exception raising.
vyper.parser.context.Context.start_blockscope and othersThe various function pairs like start_blockscope/end_blockscope,
set_in_assertion, set_in_for_loop/remove_in_for_loop should be converted to
use context manager pattern so that exiting the scope cannot be accidentally
forgotten.
vyper.parser.global_ctx.GlobalContext.get_item_name_and_attributesReturns None for item name in final block which suggests there is either a
case where something does not have a name or that there is return value
checking occuring somewhere.
vyper.parser.stmt.Statements._check_valid_assignDoes not have a final else block allowing function to exit if none of the conditions evaluate truthy.
vyper.parser.stmt.Statements.parse_forUses implicit else for final clause type. Should be converted to explicitely
check that the statement is formed as expected and add a final else block
that raises an invariant exception.
Look into enabling doctest
Hey @pipermerriam ,
I did a quick scan - looks good. Here are some quick additional notes:
- convert to circle-ci
- tox
These would be great travis is super slow (and I don't have access).
- Create
OpcodesEnum for easier referencing opcodes.
Would require all LLL statmements tho use the Opcodes Enum or have to have lookup method attached to it (all LLL function values are currently strings). Writing with strings make it easier to read / parse mentally - so I vote we keep strings in the LLL.
Takes multiple arguments like
sigs,custom_units,custom_structsand
constants. Current read of the code suggests there are data structures which
have been defined in the code and thus, this gives the parser the ability to
properly parse the AST into these types. If so, I believe this should be
formalized into something akin to the WASMStoreto wrap these data
structures up into a single wrapper for easier and less error prone
portability.
Alot of those have been refactored to be contained on GlobalContext, and mostly can be refactored to pass global_ctx through instead, so check for that.
However there are some base layer functions that will always require one to not pass them through, for example in some scenarios (building function signatures for interfaces don't have custom_units) you don't have global context populated, and in other cases you do.
Additionally, a lot of code in parser.py and parser_utils.py can be split of into sub modules (this has been my goal since I have started and have been trying to do so when possible). parser.py should contain parsing steps of basically Module / Function & Context.
vyper.types.types.get_size_of_typeRelies on prior knowledge about types. This should probably be an API of the
type classes.
vyper.types.types.has_dynamic_data
Relies on prior knowledge about types to determine. This should probably just
be an API of the type classes.
What is meant by API of type classes, could you elaborate? I think having something like BaseType.get_size(unit='bytes'|''bits') could work well.
Additional items you can add:
@pipermerriam I am currently working on a PR then I will be helping with these issues, since they potentially touch a lot of code. I'd prefer we get it all done as quickly as possible, to avoid any big branch diverges :smile:
@jacqueswww lets deal with that kind of thing on a case-by-case basis. Any of these that I deal will be done in an isolated way that touches the smallest possible section of code so if they do cause any conflicts I believe they should be reasonably easy to resolve and I'm happy to either work with you to rebase your branches or to rebase them for you in the event something messy occurs.
@pipermerriam works for me :+1: let's see how it goes, I checked the open PRs now, and I think we'll just deal with them as they come along.
vyper.utils with eth-utils utility functions.sha3 to keccak across codebase.
- replace much of what is in
vyper.utilswitheth-utilsutility functions.
I'd prefer not to pull eth-utils for 1 function, as eth-utils pulls in 4 other dependencies of only one traces back to a crypto lib for a keccak implementaiton.
@jacqueswww Minor nitpicky correction that it's 3 dependencies since cytoolz/toolz are functionally equivalent the way they are specified.
The eth-utils dependencies are:
eth-hashpycryptodome locally and eth-utils contains a keccak function which is a rough equivalent to the existing vyper.utils.sha3 helper.eth-typingtoolz/cytoolzCan you elaborate on what issues or problems you see with any of these or maybe expand on your reasoning for not wanting these to end up as dependencies.
@pipermerriam The first reason that would be that I tend to keep to stdlib as close as possible, more of a philosophical thing. It isn't that above libraries are not stable - but more about importing unnecessary dependencies (think _left-pad_ saga).
The more pragmatic reasons are:
py-ethereum #735. py-evm is off course much better split up - but I easily see conflicts still happening (and has happened to myself a couple of times e.g. vypers' eth-typing conflicts with the one I have setup for my current test suite).To add another point, we have a requirement to allow users to use the compiler with any version we have uploaded, either through a redeployment (docker, snapcraft, browser blob) or through version pining the library itself. This is so someone could test with any version of the compiler without running into dependency issues so they can validate code written with different versions.
Something to think about.
Here's my best stab at convincing you that the policy on dependencies should be more permissible, though I do fully agree that new dependencies should be evaluated thoroughly. This isn't an argument to say that eth-utils should be allowed (we can debate that one separately), but an argument that we should not have a policy of no dependencies.
The best solution to conflict dependency hell is not having dependencies.
This feels a little like sticking our heads in the sand. We can have dependencies without having dependency hell. I don't think this quite qualifies as a false dichotomy but I disagree that not having dependencies is the "best solution". Even Django has dependencies these days. So does requests.
Choosing to not have dependencies also means choosing to duplicate functionality that exists in other libraries where it is well tested, documented, and most importantly maintained as public APIs. The end result will be vyper accruing more and more vendored utility code, all of which will receive less maintenance, attention, tests, and bugfixes than it's equivalent from the libraries exposing it as public APIs. I'll take the occasional dependency issue over this maintenance nightmare any day. It's worth noting that eth-utils exists for exactly this reason. Most everything in that library is there because it existed in multiple codebases and ended up suffering from things forgetting to apply bug fixes from one repository to all of the others.
I am absolutely on board with being very conservative with the dependencies we add, but I think a blanket policy is a bad idea.
In the past research team actually had direct conflicts with
py-ethereum
This seems to be roughly the same as the previous point, but to address it directly, we aren't in the pyethereum days and haven't been for a long time. The current suite of python+ethereum tools is maintained and supported by a team of 8 full time developers and an even larger group of active third party contributors. I get that there is still scar tissue from the old days, but times have changed.
As for the issue about needing to vyper to be packaged in different ways, I'd like to objectively determine if adding these packages actually reduces our options there. Who did the js/wasm blob creation? Lets find out how we can have our cake and eat it too, since I suspect that we can have both (js/wasm portability and dependencies).
The requirement @fubuloubu mentions is important, but I also don't believe it's an argument to not have any dependencies, but rather one more thing to think about with respect to what we add and how we specify them. I am confident we can approach this in a way that allows us some freedom of movement when it comes to adding a dependency that has been thoroughly reviewed and evaluated to fit whatever our necessary criteria are for such things.
@pipermerriam I don't think that argument is that we shouldn't have dependencies. But that we shouldn't have dependencies that a.) will cause users of Vyper pip hell b.) pull in needless dependencies that we don't need, and that's why I am not keen on eth-utils. eth-typing will cause trouble in the future because every single py-evm related package will depened on it. I would more than willing to go with eth-utils if it had no dependencies itself.
TBH I don't understand why we have to be DRY about strict typing structs (eth-typing). What is wrong with having a Union or type alias in the codebase that uses them. mypy is basically a fancy linter and should be part of the test requirements if really necessary. It just feels to me that to have a linter dictate the dependency graph of shipping code, is going to hurt. Basically I am of the opion that a linting step should not dictate dependency graphs. Replace mypy with flake8, and you'd be super confused why you can't install vyper because of flake8 (not talking tests, just the compiler), which will happen especially if you have a lot of developers working on the separate repos.
I do the wasm builds, and the currently build requires us to basically export the setup requirements and make sure they can build themselves, mostly it's not a problem but anything that does C compilation can be tricky, but all manageable - and if it becomes popular some better scripts can happen here. https://github.com/jacqueswww/vyper-in-browser/tree/master/pyodide
Edit: Ha, I see this uses pysha3 because pycryptodome didn't compile. This perfect example of why having a very strict dependency policy saves you.
Edit2: So from my side I am on board with pulling in dependencies, but they have to be very strictly evaluated for the value they bring, for this specific case I really don't feel the upside merits the extra dependency weight. Me raising all of the above is how one has to be, to be conservative about dependencies - this comes from my personal experience (not only python specific, do you remember yum hell?). The cost of adding a dependency into your requirements / setup.py is unfairly low - to the potential downside it can cost in the long run.
Off dependency topic: I really like the suggested use case for ContextManager, and I will do PR for them soon :smile_cat:
A potential argument for eth-typing: we should really type annotate our codebase. Will we require eth types to do so?
@fubuloubu I would argue not, everything in eth-typing can be defined on their own terms, eth-typing is a bunch of aliases. The dependency problem with it all is that you do an _import_ statement to import the types - thereby forever binding your code to eth-typing. This also means you have to know what is defined in the installed package to know what the alias means (for folks who don't have fancy IDEs). I am much much more in favour of each repository like vyper use it's own defined type names, and even better to strive for standard annotations as much as possible.
Everything in pycryptodome could be defined on its own too, it's not a complicated library and really we're only using 2-ish functions from it. Should we eschew all dependencies and start maintaining all functions internally?
@fubuloubu the difference is we actually use a function there.
before:
vyper ->keccack lib
after:
vyper -> eth_utils -> eth_hash -> keccack lib
-> ctoolz/toolz etc. (still not sure why there are 2 equivalent libraries here?)
-> eth typing
eth-typingis by no means required for type hinting. It's just a collection of common aliases. I'm fine leaving it out.
eth-utils is by no means required and I'm more than fine tabling it.
As of now, I don't have any dependencies that I'm going to make a case for adding and I'm glad that we appear to be on roughly the same page and I'm fine being extra conservative about adding dependencies.
@jacqueswww the toolz/cytoolz auto negotiates install based on the base python architecture since cytoolz doesn't play nice with pypy but is significantly faster than toolz when we're in an environment that supports c-extensions.
@pipermerriam ah I see! it's so it's so that py-evm can run on pypy make sense (and speed boost) :)
Would you perhaps review my PR, https://github.com/ethereum/vyper/pull/1269, I am a bit new to context managers and I'd appreciate your feedback (when you have a chance).
@pipermerriam also is it ok if I put tickboxes on the above ticket so we can keep track what has been done?
tests/parser/syntax/test_no_none.py@pipermerriam that may be a tall order, we have a bunch of programs, both for functional testing and for compilation checks. I am open to this though, would just require some special setup folder or something.
Our larger examples are standalone files.
print(..) statements in tests to use logging so that output can be throttled and controlled using standard pytest CLI apis.assert_compile_failed with standard with pytest.raises(...) style failure checks.
- [ ] convert
print(..)statements in tests to useloggingso that output can be throttled and controlled using standard pytest CLI apis.
I think we should just remove those? I have never used them, and the test file name / test structure should inform the user of what's being testing in a specific file, or alternatively use a comment - don't see why using stdout doesn't provide the same but w/o print statements. I think it was maybe when all tests was in one big file?
So I'd just move those to top level comments, as they are still useful in terms of knowing more about what is being tested.
Continuing off https://github.com/ethereum/vyper/pull/1351#discussion_r266033074 I think we should a) have a CompilerPanic exception to replace asserts which suggests to the user to file a bug report, and b) catch all exceptions that aren't Vyper exceptions (e.g. ast.NameError, ValueError, etc.) at the top level and turn them into CompilerPanic. This is to help the user experience.
Idea for doctest:
@charles-cooper I will include this in the square root PR, since the PR is turning into a meta PR anyways.
@pipermerriam you have this stale branch, which I think relates to this issue. Can we delete that branch (issue is still valid)?
Also would like to walk through this issue and see if anything is still relevant to be cleaned up.
Branch can be deleted and this can be closed. If I find myself back here working on stuff I'll re-open or make a new issue.
Most helpful comment
Branch can be deleted and this can be closed. If I find myself back here working on stuff I'll re-open or make a new issue.