Solidity: Option to output the ast even before import resolution

Created on 22 Apr 2020  ·  20Comments  ·  Source: ethereum/solidity

We should provide the options to abort after certain steps:

  • parsing
  • import resolution
  • symbol resolution
  • type checking

etc.

-> only implement a switch a la stop-after: parsing for now, and only allow parsing for now.

feature

Most helpful comment

Hey everyone! We, the Solidity team, would like to discuss this topic in a little language design discussion call with you next week to come to a conclusion about the implementation.

When?

Thursday, 25th of June, 5-6PM CEST (see time in your timezone here)

What?

Discuss issue #8739 - "output the AST even before import resolution"

How?

I will share a link to join the meeting in this issue 10mins before the start. You can also get a calendar invite with all information, if you send me your e-mail address as a DM on Gitter (@franzihei).

Please let me know if you're interested in participating. :)

All 20 comments

Could add a stage field to the standard json which signals up to what point to run the CompilerStack: parse, importResolution, analysis, compilation.

Alternative: more fine-grained output selection, i.e. being able to select "preImportResolutionAST" and nothing else would stop before import resolution.
Advantage: we would have to think about and specify what information actually is present for those output selections.

The main reason we are interested in this feature is to be able to extract imports reliably. Now, we are using the ANTLR grammar, which doesn't represent the exact same language, especially when a file fails/should fail to parse.

As I mentioned Chris in a call, we are ok with not getting an AST if the file fails to parse, but we'd love to get the same errors that solc would return.

I think anyone working on tools that process Solidity sources would benefit from this, so I'll tag a few of them: @iamdefinitelyahuman @cgewecke @fvictorio @frangio @jcaracciolo

I have played around with this idea, and it seems is not that hard to accomplish. Here is a fork where i manage to get some functionality to work for it with some considerations (https://github.com/jcaracciolo/solidity/commit/bb4a0aa93d980038ecf9161addd78eb29a92ed65). I know it may probably break in most edge cases, but i try it out with a couple of contracts and worked out great.

There is an issue with the requirements to "analyze" the file when the AST JSON is generated. This is because the AST JSON contains much more data than just parsed information (such as scope, complex types and others) that require for more processing to be done. This will be trickier without the imports since some of the information required to "analyze" the file is present in other files, and the compiler will fail to execute if it cant find them.

There has to be some thought if all this extra information gathered after the parsing is required in the AST JSON when parsing only one file. If is not, then is trivial to do the change, just skip the this loop and double check the AST JSON generation wont throw a null pointer on missing data (which is basically the commit in my fork).

If in fact it is, then there are a lot more edge cases to take into account when trying to analyze a file without its imports.

The main reason we are interested in this feature is to be able to extract imports reliably.

I agree that this is much needed. We should consider changing the focus of this issue to import extraction exclusively. It might be interesting to get an AST with unresolved imports and symbols, but it might be best to discuss it as a separate feature.

Something interesting about import extraction is that it's recursive, so the user would need to request import extraction repeatedly. For this use case, requesting an entire AST is overkill and may have a performance impact.

By the way, when using javascript, this can be done "manually" using the "import file read callback" - and that is actually what it is meant to be used for.

I think the import file callback is somewhat limited, as it only works in js, and doesn't work when using a native/docker version of solc. As we discussed in #3168 and in private, we think using a native solc would improve the development speed in most cases.

The benefits of exporting the parser aren't just extracting imports. That's why I tagged other people that deal with solc's last or build/use their own parsers. Can you please comment here if this would help you?

I have two main uses for accessing an AST prior to compilation:

  • I can more effectively determine if a contract has changed, and from that infer which source files to recompile. It's unfortunate to have to run the compiler on an entire project because the user added an extra line in their SafeMath library.
  • I can better determine dependencies and associations. Knowing imports is useful, but a single source file can contain multiple contracts and a user might forget to remove an import statement for a contract they no longer require. The AST lets me build an effective dependency chain so I know exactly which contracts to recompile when I detect a change.

I'm also in agreement with @alcuadrado that this is useful in the native solc.. I'm working in Python so solcjs isn't an option.

To concur with @iamdefinitelyahuman and @alcuadrado, an exported parser would be very helpful. There's always a gap between the JS parser's accuracy and solc as new syntax is introduced and it's the source of a large number of issues in both solidity-coverage and eth-gas-reporter.

These tools rely on fast parsing either to avoid double-compilation (because they modify contracts) or strong coupling with the user's compilation strategy (e.g requiring they use the correct solc output selections and knowing how they store their comp artifacts locally)

Is there any update on where this issue is heading? I do agree there are much more cases in which an exported parser could be used for, other than extracting imports.

I still like @axic's suggestion most. Would we add it to settings or to the top level?

Hey everyone! We, the Solidity team, would like to discuss this topic in a little language design discussion call with you next week to come to a conclusion about the implementation.

When?

Thursday, 25th of June, 5-6PM CEST (see time in your timezone here)

What?

Discuss issue #8739 - "output the AST even before import resolution"

How?

I will share a link to join the meeting in this issue 10mins before the start. You can also get a calendar invite with all information, if you send me your e-mail address as a DM on Gitter (@franzihei).

Please let me know if you're interested in participating. :)

During the call we would like to know:

  • what the participants would use the feature for
  • what exactly is needed as output
  • if other stages to stop compilation would be useful

Join the call in 15 mins here --> https://meet.google.com/mga-marp-oid.

Thanks everybody for joining the call!

Really rough notes (please feel free to correct or add to it):

Attendees: Alex B, Aniket, Ben H, Chris R, Francisco G, Franco V, Franco Z, Franziska H, Nick D, Patricio P, Yann L.

Questions for tool builders:

  • what the participants would use the feature for
  • what exactly is needed as output
  • if other stages to stop compilation would be useful

Buidler: need relative path as it is written in Solidity source code

Truffle: we look to see which Sol files have been changed; min set of files that need to be compiled at a certain time; we don’t need absolute paths either, we just need relative paths upfront, list of sources is only used for the “profiler”

Solhint: needs to parse files to do static analysis, using JS parser, for prettier I just need the syntax, no need for types, for solhint, only syntax - works on file level (no cross file analysis), if we had a type AST that could probably be useful too (but not sure how yet)

Chris: let’s introduce a generic flag to stop after point x and at a later point in time we could still introduce other stages to stop?

Remix: will use features to have fast feedback and display more details quickly

Brownie: would it be possible to have an AST fully linked without static analysis?

Chris: yes, before type checker

Brownie: could I build a dependency tree with that? having an AST for just one file is useful; if I can build all the associations; build dependent chain without static analysis would allow me to fine tune what contracts I am compiling

Chris: I think the main problem is that the compiler is f*ing slow, we should also focus on that to increase compilation time

Truffle: but exposing a parser would be easier than making the compiler faster and it’s useful.

[...]

_We also discussed other tooling issues in this call, which are not relevant for this issue (hence I'm not putting the other notes here)._

summary: Implement a switch e.g. stop-after: parsing that only handles a single file and does not even call the import callback. I think the absolute path can still be put in the annotations. For now, we only have 'parsing' - but we could extend it at some point in the future by defining other "checkpoints". The main problem is to check that such an incomplete AST can be exported without crashing.

I created https://github.com/protofire/solhint/issues/232 to keep track of this in solhint. I'm referencing it here because I assume other people building on top of this future solc parser might have similar issues.

Is it safe to assume these things?

  1. A parser for v0.4.26 can parse any v0.4.x file?
  2. A parser for v0.5.17 can parse any v0.5.x file?

If so, can we consider backporting this change to each of those majors? This way we'd be able to parse any file with solc.

We usually only backport critical bugfixes, but we can see how much work the task is overall.

Backporting this to 0.4.26 and 0.5.17 would be a radical step in making this hugely valuable. The main objective behind this was to for the first time have a proper parser that works reliably, covers the entirety of the language and is always up to date, but if it's only available for the newer versions then parsing code for the older versions that still are relevant today remains a problem. This means Solidity parsing as a whole will remain an unsolved problem, making it hard to step up the tooling ecosystem.

This is super important for developer experience.

Was this page helpful?
0 / 5 - 0 ratings