Rust-analyzer: StructuralSearchReplace: next steps

Created on 17 Feb 2020  Â·  28Comments  Â·  Source: rust-analyzer/rust-analyzer

https://github.com/rust-analyzer/rust-analyzer/pull/3099 implemented initial version of SSR.

This issue collects things we need to do to make this really production-read

@mikhail-m1 any other things I've missed?

E-hard fun good first issue

Most helpful comment

I’m the author of a little-used and hard to maintain (due to being on nightly) tool called Rerast. I’d like to deprecate Rerast (or at least the nightly version thereof) and SSR looks like a great step towards being a replacement!

I started work on a rewrite of Rerast, built on rust-analyzer, then thought that perhaps the functionality might belong in rust-analyzer itself and only then did I find SSR.

The rule syntax is slightly different, so I’d like to get your thoughts on that and whether you’d be prepared to accept such a change. Like SSR, I did start out using macro_rules syntax, but found that it’s pretty rare to want to constrain placeholders in that way. Placeholders, at least outside of macro calls, only match to a single AST node and whether it’s an expression, pattern or whatever usually depends on the surrounding context, so further constraining the placeholder usually isn’t necessary. So I instead opted for just $a as a placeholder.

Another reason to deviate from macro_rules syntax is that the semantics are different, and I’d worry that using the same might be confusing. For example, in macro_rules, $a:expr will match as much as it can before the expression parser gets an unexpected token. So a pattern of $a:expr + $b:expr when presented with code 1 + 2 would consume the whole of 1 + 2 with the first placeholder, then matching of + would fail. That’s not really what we want with structured matching and I felt that deviating from the macro_rules syntax made that clearer.

I did however want to constrain placeholders in some cases. For this, I opted to use a form that allowed multiple constraints - ${a:constraint1:constraint2...}. For a somewhat contrived example, ${a:type(i32):not(kind(literal))} will match anything that is an i32 and isn’t a literal.

The features I’ve implemented so far are:

  • Matching of expressions, type references, patterns, paths and items.
  • Recursive matching (and replacing). For example, replacing foo($a) with bar($a) will transform foo(foo(42)) into bar(bar(42)).
  • Search and replace of macro calls (but not yet within another macro call). So you can replace try!(foo()) with foo()?, but can’t yet do that if it’s inside another macro call. I think this can be implemented but with some limitations.
  • Search and replace within macro calls - of things that aren’t themselves macro calls, since we search the macro expansion.
  • Matching paths by their fully resolved path. So for example a::b::foo() will match code b::foo() if b resolves to a::b.
  • Constraining placeholders by type. e.g. ${a:type(a::b::Foo)}.bar() will match x.bar() if the fully qualified type of x is a::b:Foo (I don’t currently include the crate name, but probably should).
  • Matching record field lists in any order.
  • Reporting of the reason why a particular match fails. This is super-useful for development, but might be useful in the UI as well. I’d imagine you’d select some code that you’d like to replace then choose the SSR action. It would then give you a pattern that matched the code you selected and as you edit it, tell you if it still matches or why it doesn’t.

Anyway, if there’s interest, I can have a go at integrating that into the existing SSR module, but wanted to check first, in particular about the change of syntax.

If you wanted to see the code, the tests are probably most illustrative of what's working so far. If you wanted to run it, you might need to adjust the paths in Cargo.toml. Also, rust-analyzer needs a one line change to make ra_syntax::SyntaxElementChildren public. I should probably send a PR for that, but was waiting to see if I needed anything else.

All 28 comments

The most important feature here is documentation!

What kinds of things are you thinking about for the second item?

White space differences should not matter for natching

Made an initial change for the whitespace, but there's two points I'm not certain about:

  1. The Rust reference says whitespace is defined by the Unicode property Pattern_White_Space, but the char method is_whitespace uses the Unicode property White_Space instead and they're not exactly the same. I didn't notice any code in rust-analyzer for checking Pattern_White_Space so I assume we don't really care about the difference and it's fine to use is_whitespace?
  2. It seems like we might want to completely ignore whitespace in certain situations (e.g. match f(x) with f( x ), but I don't think we can do that in general. The current change only matches various forms of whitespace with each other, but doesn't ignore any whitespace. Any opinions on that?

Actually I think for my second question it's fine to throw out all whitespace tokens. I can't actually think of any situations where it would be a problem.

Note that there's an is_trivia method on SyntaxKind, you don't need to implement this yourself. E.g. some_token.kind().is_trivia().

Yeah, the line that checks for Pattern_White_Space is in the rustc_lexer
crate, which we share with the compiler

On Sat, 22 Feb 2020 at 23:18, Florian Diebold notifications@github.com
wrote:

Note that there's an is_trivia method on SyntaxKind, you don't need to
implement this yourself. E.g. some_token.kind().is_trivia().

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/rust-analyzer/rust-analyzer/issues/3186?email_source=notifications&email_token=AANB3M5BIBPSRH43ADJJVELREGQDNA5CNFSM4KWONEZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMVL4VY#issuecomment-590003799,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AANB3M2QY4CRZWSFQLYQ6D3REGQDNANCNFSM4KWONEZA
.

Cool, thanks.

@adamrk sorry, I missed that you started to work on it and created a PR for the second step.

my mistake @mikhail-m1 - didn't realize you were working on it either.

Example of semanticly equivalent transformations -- trailing comma:

            // replace_children(self, $what:expr, $where:expr) ==>> self.replace_children($what, $where)
            return replace_children(
                self,
                single_node(old.syntax().clone()),
                iter::once(segment.syntax().clone().into()),
            );

I've actually been using ssr and it works surprisingly well.

One thing I've found is that typing ssr in comment and then copy-pasting it is the most productive workflow. I think we should probably make this the workflow.

Specifcally, I imaging something like this:

  • If the cursor is on a comment with // ssr: tag, it initiates ssr session.
  • The contents of the comment is interpreted as ssr query (we perhaps can replace ==>> with just splitting by lines?)
  • The ssr can be applied via code action
  • (stretch goal) the contents of ssr comment is syntax highlighted
  • (stretch goal) the matching elements in the current file are highlighted

next steps:

@RReverser no, we currently run on pre-expansion code. Although we should probably run post-expansion eventually. A minimal test-case would be useful!

no, we currently run on pre-expansion code. Although we should probably run post-expansion eventually

Hmm, my problem is exactly that it doesn't seem to match pre-expansion code.

I've tried something as simple as:

eprintln!($a:expr, $b:expr) ==>> $a + $b

and it didn't match anything.

Ah, so this is an interesting case. The problem here is that although 1+1 in eprintln!("{}", 1 + 1) looks like an expression to a human reader, to a compiler it is just a token tree:

image

This thing becomes an expression only after expansion, and, in general case, connection between original code and expanded code is a pretty weak one. We probably should handle such cases with heuristics, as it is a reasonable thing for the user to write, but it is not trivial.

We probably should handle such cases with heuristics, as it is a reasonable thing for the user to write, but it is not trivial.

Yeah, it sounds like a more complicated path. Perhaps it would be easier to add support for other token types aside from expr?

Then this use-case, like many others, would be naturally covered by tt tokens. The $a + $b was just an example, in fact I wanted to replace all if opts.verbose { eprintln!(...) } with info!(...) but as expr is the only token type available, I had to try and match contents of "arguments" as well.

If full Rust-like macro matching power was available, I could do if opts.verbose { eprintln! $args:tt } ==>> info! $args instead.

I’m the author of a little-used and hard to maintain (due to being on nightly) tool called Rerast. I’d like to deprecate Rerast (or at least the nightly version thereof) and SSR looks like a great step towards being a replacement!

I started work on a rewrite of Rerast, built on rust-analyzer, then thought that perhaps the functionality might belong in rust-analyzer itself and only then did I find SSR.

The rule syntax is slightly different, so I’d like to get your thoughts on that and whether you’d be prepared to accept such a change. Like SSR, I did start out using macro_rules syntax, but found that it’s pretty rare to want to constrain placeholders in that way. Placeholders, at least outside of macro calls, only match to a single AST node and whether it’s an expression, pattern or whatever usually depends on the surrounding context, so further constraining the placeholder usually isn’t necessary. So I instead opted for just $a as a placeholder.

Another reason to deviate from macro_rules syntax is that the semantics are different, and I’d worry that using the same might be confusing. For example, in macro_rules, $a:expr will match as much as it can before the expression parser gets an unexpected token. So a pattern of $a:expr + $b:expr when presented with code 1 + 2 would consume the whole of 1 + 2 with the first placeholder, then matching of + would fail. That’s not really what we want with structured matching and I felt that deviating from the macro_rules syntax made that clearer.

I did however want to constrain placeholders in some cases. For this, I opted to use a form that allowed multiple constraints - ${a:constraint1:constraint2...}. For a somewhat contrived example, ${a:type(i32):not(kind(literal))} will match anything that is an i32 and isn’t a literal.

The features I’ve implemented so far are:

  • Matching of expressions, type references, patterns, paths and items.
  • Recursive matching (and replacing). For example, replacing foo($a) with bar($a) will transform foo(foo(42)) into bar(bar(42)).
  • Search and replace of macro calls (but not yet within another macro call). So you can replace try!(foo()) with foo()?, but can’t yet do that if it’s inside another macro call. I think this can be implemented but with some limitations.
  • Search and replace within macro calls - of things that aren’t themselves macro calls, since we search the macro expansion.
  • Matching paths by their fully resolved path. So for example a::b::foo() will match code b::foo() if b resolves to a::b.
  • Constraining placeholders by type. e.g. ${a:type(a::b::Foo)}.bar() will match x.bar() if the fully qualified type of x is a::b:Foo (I don’t currently include the crate name, but probably should).
  • Matching record field lists in any order.
  • Reporting of the reason why a particular match fails. This is super-useful for development, but might be useful in the UI as well. I’d imagine you’d select some code that you’d like to replace then choose the SSR action. It would then give you a pattern that matched the code you selected and as you edit it, tell you if it still matches or why it doesn’t.

Anyway, if there’s interest, I can have a go at integrating that into the existing SSR module, but wanted to check first, in particular about the change of syntax.

If you wanted to see the code, the tests are probably most illustrative of what's working so far. If you wanted to run it, you might need to adjust the paths in Cargo.toml. Also, rust-analyzer needs a one line change to make ra_syntax::SyntaxElementChildren public. I should probably send a PR for that, but was waiting to see if I needed anything else.

@davidlattimore I would be super excited to see this in rust-analyzer. I am pretty sure that you actually know more about structural search replace at this point than i do, so I'd trust your opinion about the best syntax and what not. So feel free to send just send PRs to rust-analyzer, I'd be able to help with ra APIs and some low-level code style issues, but I'd feel confident to just delegate the larger design questions to you :0)

This is super-useful for development, but might be useful in the UI as well. I’d imagine you’d select some code that you’d like to replace then choose the SSR action. It would then give you a pattern that matched the code you selected and as you edit it, tell you if it still matches or why it doesn’t.

Yeah, one interesting thing I've noticed is that the stupid "copy pattern into the ssr action" turns out to be pretty nice UI-wise. It's much more bare bones then IntelliJ custom dialog, but somehow more useful. Long, term, I'd like to see the following UI:

User starts comment with a special marker, like

// ssr
// foo($a) => bar($b)

we then provide syntax highlighting, completion and what not inside the comment syntax, as well as incrementally coloring matches in the current file. To actually execute the action, there's just an ctrl + . code action. Though, I'd think about UI fanciness after making sure that the base implementation works correctly.

Constraining placeholders by type

Note that current impl is purely syntax-based, it doesn't use any semantic info. It should be possible to add types & paths into the mix, but it might require some tweaking:

  • the rust-analyzer API to go from syntax to semantic info are not super well-developed at the moment (the main entry is Semantics type)
  • care should be taken to make it fast. rust-analyzer does lazy bottom-up analysis, which is a perfect fit for completion, but is not as good for searches. Doing both searches and completion efficiently is an open problem at the moment. You might want to look at search.rs for an example which adds semantic validation on top of syntax-based (really, text-based) matching.

Cool. I'll start sending PRs. I might split a few features into separate PRs. I was thinking roughly the following:

  • Remove :expr from placeholders.
  • Replace parsing and matching (adds support for matching things other than expressions).
  • Support searching within macro expansions.
  • Support matching macros.
  • Support constraints (in particular type constraint).
  • Matching with path resolution.

One thing I haven't implemented yet that the current SSR (and rerast-nightly) supports is treating Foo::bar(&$a) the same as $a.bar(). I'm unsure about supporting this without checking that the same method is being called. For example, if I search for Foo::to_string($a:expr) in the following code, it will match 5.to_string() even though 5 has no relationship to Foo and it's a different method being called.

// Foo::to_string($a:expr) ==>> String::new()

struct Foo {}
impl Foo {
    fn to_string(&self) -> String {
        String::new()
    }
}
fn main() {
    let f1 = Foo {};
    // All three of the following lines match, but we don't really want the last one to.
    Foo::to_string(&f1);
    f1.to_string();
    5.to_string();
}

That seems like it could be surprising behaviour, so I'm inclined to remove that support for now and add it back in later on once we can verify that it's the same method. Possibly writing the search instead as something like crate::Foo::to_string($a).

Thanks for pointing out the possible slowness with using semantic info. I should be able to move checking of constraints to after the rest of the pattern has been validated. That should very substantially reduce the number of lookups required - although it would depend on how unique the rest of the search pattern is.

Thanks for pointing out the possible slowness with using semantic info.

See also this nice blog post about how semantic code searches are handled in IntelliJ: https://habr.com/en/company/JetBrains/blog/451754/

I think for rust-analyzer we might be able to come up with something even more efficient (salsa allows us to lift the "single file index" constraint), but that is very much unexplored territory.

BTW, it might be nice to be able to apply SSR by running rust-analyzer on the command line as well :thinking:

Forgot to reference this issue, but #5120 added support for SSR from the rust-analyzer command line

I figred that it migbe be useful to support statements in SSR:

let $var = foo();
$var.bar();
==>>
foo().bar();

I've added path resolution to SSR rules, the main user-visible changes are:

  • SSR now matches paths based on whether they resolve to the same thing instead of whether they're written the same.

    • So foo() won't match foo() if it's a different function foo(), but will match bar::foo() if it's the same foo.

  • Paths in the replacement will now be rendered with appropriate qualification for their context.

    • For example foo::Bar will render as just Bar inside the module foo, but might render as baz::foo::Bar from elsewhere.

  • This means that all paths in the search pattern and replacement template must be able to be resolved.
  • It now also matters where you invoke SSR from, since paths are resolved relative to wherever that is.
  • Search now uses find-uses on paths to locate places to try matching. This means that when a path is present in the pattern, search will generally be pretty fast.
  • Function calls can now match method calls again, but this time only if they resolve to the same function.

Regarding supporting statements, I agree, that would be good to support. It'll probably require some refactoring since currently we only match whole AST nodes, not sequences of nodes. Also, the example you gave would also require allowing a placeholder to be repeated, then checking the equivalence of each occurrence.

I just started looking at making an SSR assist, but I'm slightly unsure of how it should work if there's an SSR rule, but it fails validation. How should the error be presented? It looks like the assist API just does edits, but doesn't offer any scope for interaction or error reporting. Is that correct?

I just started looking at making an SSR assist, but I'm slightly unsure of how it should work if there's an SSR rule, but it fails validation. How should the error be presented? It looks like the assist API just does edits, but doesn't offer any scope for interaction or error reporting. Is that correct?

You could report an error for all invalid ssr comments instead of waiting for the assist to be invoked.

That sounds like it would work.

Another thing that I've been thinking it would be good to support is a way to scope the rule. Sometimes you just want to automate a repetitive edit to the current file, or even within the current function. One way I was considering was by extending the placeholder constraint syntax. So ${:in(fn)} foo ==>> bar would replace foo with bar in the current function. Similarly in(file) for the current file. This has the advantage of not requiring any new syntax. However I'm open to suggestions if people have any ideas for a dedicated syntax for constraints that apply to the whole pattern.

A completely different option would be to just have different commands - e.g. "Structural Search Replace - current file". That's perhaps more discoverable, but less scalable (adding new scopes would be a pain).

A third option would be building a more sophisticated UI for doing SSR - one that could have drop downs, checkboxes etc. I know comparatively little about extending vscode, so I'm not sure how plausible that would be.

Another thing that I've been thinking it would be good to support is a way to scope the rule. Sometimes you just want to automate a repetitive edit to the current file, or even within the current function. One way I was considering was by extending the placeholder constraint syntax.

I think a bit more natural option could be "search within selection" which is how regular Find & Replace already limits itself in VSCode. Then user can easily select a single function, or couple of them, or whole module, etc.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bjorn3 picture bjorn3  Â·  25Comments

aochagavia picture aochagavia  Â·  41Comments

yaahc picture yaahc  Â·  27Comments

GrayJack picture GrayJack  Â·  25Comments

matklad picture matklad  Â·  25Comments