This has come up again and again. While it's easy to traverse the AST (ast.Walk, ast.Inspect), it's not easily possible to use those for general tree rewrites w/o significant amount of work.
For instance, if we wanted to rewrite all expressions (ast.Expr) of a certain kind into something else, ast.Walk and ast.Inspect would need to consider every node that contains ast.Expr fields separately, rather than just look at nodes that satisfy ast.Expr.
AST rewriters exist in other packages; most notably perhaps in gofmt (for gofmt -r); that one is reflection-based. Reflection-based approaches tend to be general, but also hard to understand, and slower than necessary.
API starting point:
// An ApplyFunc is invoked by Apply for each node, before and/or after
// the node's children. See Apply for the interpretation of the return
// value.
//
// The node is given by the node pointer's address addr which makes it
// possible for an ApplyFunc to rewrite the node. The node's parent is
// the node containing the *addr. If the node is part of a list, index
// identifies its position in that list. Index is < 0 otherwise.
type ApplyFunc func(parent Node, index int, addr interface{}) bool
// Apply traverses a syntax tree recursively, starting with the node
// identified by parent, index, and addr. See Apply for the meaning
// of these arguments.
//
// If pre is not nil, it is called for each node before its children
// are traversed (pre-order). If the result of calling pre is false,
// no children are traversed, and post is not called for the node.
//
// If post is not nil, it is called for each node after its children
// were traversed (post-order). If the result of calling post is false,
// traversal is terminated and Apply returns immediately.
func Apply(parent Node, index int, addr interface{}, pre, post ApplyFunc) {
cc: @alandonovan
This seems like a good addition. The reflect-based implementations in gofmt (and eg) is fearsomely complex, and there are other times I have wanted to use this function.
The index is not enough to uniquely identify a subtree of a node. For example, ast.CaseClause contains two slices of Nodes, and ast.BinOp contains two Exprs. I think we need to identify the field too. The most obvious way to do that is by its name. Field numbers, or field offsets, might be marginally more efficient, but would certainly be harder to read, and I'm users would thank us for choosing strings when they're debugging.
I'm not sure you need the index (or index + field name) in the Apply function, only in ApplyFunc.
I've also wanted this.
Index is sufficient I think if it counts across all children. E.g., in AssignStmt, it counts from 0 to len(Lhs)+len(Rhs)-1.
I've had cases where I want to be able to walk up the AST more than one Node, so I maintain a slice of (Parent, Index) tuples.
Aside: Index technically isn't necessary, and Java's Compiler API omits it from com.sun.source.util.TreePath. But I think it's handy to have.
I'm not sure I understand what addr's dynamic type will be from the description. I'm guessing though that if I'm visiting a ast.Ident, then it could be either a *ast.Ident (e.g., the name for a ValueSpec) or *ast.Expr (e.g., an identifier in an expression context)?
The dynamic type would be a pointer to the variable (field or slice element), whatever that may be: Node, Expr, *Ident.
Here's a slightly more thought-through API that provides explicit names for fields. Nice, but definitively somewhat costly:
// An ApplyFunc is invoked by Apply for each node, before and/or after
// the node's children. See Apply for the interpretation of the return
// value.
//
// The node is given by the node pointer's address addr which makes it
// possible for an ApplyFunc to rewrite (replace) the node. The node
// is referred to from its parent which contains a field with the given
// name. If the field is a list, index identifies the node's position in
// that list; index is < 0 otherwise. Or roughly speaking:
//
// addr == &parent.name if index < 0
// addr == &parent.name[index] if index >= 0
//
// Exception: If the parent is an *ast.Package, and Apply is iterating
// through the Files map, name is the filename, and index is incremented
// by 1 with each file, starting at 0. Assigning to *addr will change the
// corresponding Files map entry but only upon return from pre and post.
type ApplyFunc func(parent Node, name string, index int, addr interface{}) bool
// Apply traverses a syntax tree recursively, starting with the node
// identified by parent, name, index, and addr. See Apply for the meaning
// of these arguments.
//
// If pre is not nil, it is called for each node before its children
// are traversed (pre-order). If the result of calling pre is false,
// no children are traversed, and post is not called for the node.
//
// If post is not nil, it is called for each node after its children
// were traversed (post-order). If the result of calling post is false,
// traversal is terminated and Apply returns immediately.
//
// Only fields that refer to AST nodes are considered children;
// specifically, token.Pos, Scopes, Objects, and fields of basic types
// (strings, etc.) are ignored. Children are traversed in the order in
// which they appear in the respective node's struct definition.
func Apply(parent Node, name string, index int, addr interface{}, pre, post ApplyFunc) {
@mdempsky points out that alternatively, index could simply be monotonically increasing. The index of a list element would be the provided index minus the index of the list. A utility function could provide the name of a field for a given index if necessary. That function could be reflection-based since it's probably not speed-critical.
We probably still want the field name though, so that you can locate the subtree easily in cases like SliceExpr without having to do four separate comparisons---and arguably more importantly, without having to remember to do four separate comparisons. I have found several bugs caused by assuming that a given node was the "correct" subtree of the parent node, when in fact it was another one. Passing the field name will help to make it obvious that this is something you need to think about.
I think the easier solution is to still reserve indices for nil fields. So for SliceExpr you'd always have 0==X, 1==Low, 2 == High, 3==Max.
You can't use the same number to indicate both a field index and a slice index.
I played with this a bit and I have a prototype that could work. Observations: Passing an address to a field is problematic: Many fields that we are interested in are (ast.)Expr, Stmt, and Decl fields, which are interfaces. To "unpack" them we have to type-switch on the address, then deref the address and then type-switch again on the contents; thus requiring two type switches. Here's a better approach: If we have the parent, field name (or field index), plus slice index if needed, we have all that is necessary because we can use reflect to set a field with this information. Changing/rewriting fields is (probably) much less common then reading (traversing) the tree, thus the more costly set field operation is ok. In turn, the API can be closer to Walk and thus easier to use:
// An ApplyFunc is invoked by Apply for each node n, before and/or after
// the node's children. See Apply for the interpretation of the return
// value.
//
// The parent, name, and index field are used to identify the parent
// node's field containing n. If the field is a list, index identifies
// the node's position in that list; index is < 0 otherwise. Roughly:
//
// n == parent.name if index < 0
// n == parent.name[index] if index >= 0
//
// This information can be used to update/rewrite a node with SetField.
//
// Exception: If the parent is a *Package, and Apply is iterating
// through the Files map, name is the filename, and index is -1.
type ApplyFunc func(parent Node, name string, index int, n Node) bool
// Apply traverses a syntax tree recursively, starting with the node
// identified by parent, name, index, and n. See Apply for the meaning
// of these arguments.
//
// If pre is not nil, it is called for each node before its children
// are traversed (pre-order). If the result of calling pre is false,
// no children are traversed, and post is not called for the node.
//
// If post is not nil, it is called for each node after its children
// were traversed (post-order). If the result of calling post is false,
// traversal is terminated and Apply returns immediately.
//
// Only fields that refer to AST nodes are considered children.
// Children are traversed in the order in which they appear in the
// respective node's struct definition.
func Apply(parent Node, name string, index int, n Node, pre, post ApplyFunc)
// SetField sets the named field in the parent node to n. If the field
// is a slice, index is the slice index. The named field must exist in
// the parent, n must be assignable to that field, and the field must be
// indexable if index >= 0. Roughly:
//
// parent.name = n if index < 0
// parent.name[index] = n if index >= 0
//
// The parent node may be a pointer to the struct containing the named
// field, or it may be the struct itself.
//
// Exception: If the parent is a Package, n must be a *File and name is
// interpreted as the filename in the Package.Files map.
func SetField(parent Node, name string, index int, n Node)
Open questions:
1) Should Apply require parent, name, and index (which often will be nil, "", -1) so that even the top-level node can be rewritten, or should that require a special case and in return a simpler Apply that just requires, the starting node n, pre, and post?
2) Should field addressing happen via a name (string), which is nice to read but more expensive than say just the field index as provided by reflect)?
On 14 September 2016 at 23:52, Robert Griesemer [email protected]
wrote:
If we have the parent, field name (or field index), plus slice index if
needed, we have all that is necessary because we can use reflect to set a
field with this information. Changing/rewriting fields is (probably) much
less common then reading (traversing) the tree, thus the more costly set
field operation is ok. In turn, the API can be closer to Walk and thus
easier to use:
Sounds good. It's interesting that Apply is now a primitive interface for
traversing AST tree nodes and edges, whether or not you wish to update the
tree. In other words, you could define the existing ast.Walk and
ast.Inspect functions in terms of Apply. That might be a good exercise to
validate the API.
// Apply traverses a syntax tree recursively, starting with the node
// identified by parent, name, index, and n. See Apply for the meaningtypo: "See ApplyFunc"
1) Should Apply require parent, name, and index (which often will be nil,
"", -1) so that even the top-level node can be rewritten, or should that
require a special case and in return a simpler Apply that just requires,
the starting node n, pre, and post?I don't have a good sense of how important this case will be. You could
provide a wrapper for the root case with the three zero arguments.2) Should field addressing happen via a name (string), which is nice to
read but more expensive than say just the field index as provided by
reflect)?I see three reasons to prefer field names over field indices:
(1) Client code that uses strings will be easier to read. Unnamed integer
constants are inscrutable and error-prone.
(2) Strings are invaluable during debugging. You can print them out to
document the descent, something that is hard to do with the existing (read
only) traversal API.
(3) The AST is bound to change, if only slightly, in future versions of Go,
and we should not assume that all new fields will be added at the end of
their struct, thus the field numbering may not be durable in the long term.
@alandonovan ast.Walk was written w/o a very good understanding of use cases; in fact, I introduced ast.Inspect later because it was much easier to use in many situations. We cannot remove these functions but we could mark them as deprecated. ast.Apply should cover both use cases nicely.
I didn't mean you should remove or deprecate them, only that you should go through the exercise of implementing the old functions in terms of the new, since it might be revealing.
@alandonovan I understand. I think it can be done but it will be difficult to guarantee 100% semantic equivalence w/o manually checking each case, or an extensive test suite testing each node type and exit scenario. For one, Walk doesn't invoke the visitor if a node is nil, while Apply probably should (and my prototype currently does) so that the pre/post closures have a chance to rewrite/set the node (that feature will make it possible to implement Walk, while Walk could not be used to implement Apply since it misses nodes).
You can't use the same number to indicate both a field index and a slice index.
I think we're talking past each other. Examples of how I use a single index to identify child nodes:
That said, I'm not opposed to field names.
On 15 September 2016 at 12:48, Matthew Dempsky [email protected]
wrote:
You can't use the same number to indicate both a field index and a slice
index.I think we're talking past each other. Examples of how I use a single
index to identify child nodes:
- To find the i'th child Node of a CallExpr: i==0 means Fun, and
otherwise you want Args[i-1].- To find the i'th child of an AssignStmt, i
and otherwise you want Rhs[i-len(Lhs)]. That said, I'm not opposed to field names.
Ah, so you would flatten out the scalar fields and the slices into one
sequence. Makes sense, though the bookkeeping seems onerous.
For a concrete implementation, see:
https://github.com/griesemer/dotGo2016/commit/f0f16c266e81095ef09f9af10a03df149c8f3488
I recently started on a tool for vim-go and wanted to easily rewrite AST as well. I've seen this issue and really looking forward what we get out from this. Meanwhile, I've copied ast.Walk() and implemented a straight forward rewriter by changing the signature of the walk function we pass. Here is a working repo: https://github.com/fatih/astrewrite I'm going to use it for my next project. It might be not perfect or might have errors (which are unknown to me know), but I thought I'll share to show a different API usage.
Also I think, we can introduce a ast.Rewrite function without breaking the current API or add it to the golang.org/x/tools/go/ast/astutil.
@fatih Any reason the version I mentioned above ( griesemer/dotGo2016@f0f16c2 ) wouldn't have worked for you?
@griesemer at first the API was a little bit confusing for me. So I didn't tried to investigate in depth. But I've tried to give it a look now that you asked for it.
There are more steps involved changing a node with Apply(). Using the example src below
src := `package main
type Foo struct{}`
fset := token.NewFileSet()
file, _ := parser.ParseFile(fset, "foo.go", src, parser.ParseComments)
I've tried to rename the type name from Foo to Bar. here is how we can do it with Apply() and with Walk() from the astrewrite package"
applyFunc := func(parent ast.Node, name string, index int, n ast.Node) bool {
spec, ok := parent.(*ast.GenDecl)
if !ok {
return true
}
x, ok := n.(*ast.TypeSpec)
if !ok {
return true
}
x.Name.Name = "Bar"
spec.Specs[index] = x
return true
}
rewritten := ast.Apply(file, applyFunc, nil)
rewriteFunc := func(n ast.Node) (ast.Node, bool) {
x, ok := n.(*ast.TypeSpec)
if !ok {
return n, true
}
x.Name.Name = "Bar"
return x, true
}
rewritten := astrewrite.Walk(file, rewriteFunc)
With Apply() I have to know the parent nodes type as well, so I can replace the item via the index. However in Walk() from astrewrite I don't need to know the parent node at all. I only change the node I'm interested in and then return it back.
--
Note that my current use case is very simple. There might be other use cases that Apply() covers that I'm not aware of. I'm just trying to give my feedback as a regular Go user that occasionally uses the go/ast package for writing or improving existing tools.
@fatih Thanks for looking into this. A few observations:
1) When you use Apply, you don't need to type-assert on the parent - you can simply use the new SetField function (also documented in the API): ast.SetField(parent, name, index, <newnode>). With that, both versions will be approximately the same amount of code to write.
2) In this specific case where you simply change a string, there's no need to use a complicated rewrite in the first place: You could just walk the tree and change that string directly.
3) Your implementation of astrewrite is easier to use in simple cases, but it is almost always more costly: The implementation of astrewrite always needs to write back the result returned by rewriteFunc, and because it's always an ast.Node there's always a type assertion needed for the assignment to work (in the astrewrite implementation) - that's relatively costly. You have to always do this for every single node even though most nodes are not rewritten. On the other hand, with ast.Apply, the ast.SetField operation is relatively expensive but you only pay when you actually rewrite something.
4) Finally, with ast.Apply you have the choice to do the rewriting in pre- or post-order when traversing the tree, which is often important to be able to control - this is not possibly with astrewrite.
To summarize: If you only want to change the name of things, you may not need a rewrite at all. Secondly, while astrewrite may be fine for your (more general) use case, it's probably not sufficiently powerful as a general rewrite mechanism, and likely quite a bit slower than the suggested ast.Apply. Hope that helps.
Anyway, thanks again for investigating!
@griesemer Thanks a lot for the detailed explanation. I've didn't know that I could just use SetField, missed that totally. My example was simple by purpose to show the API. You're right that it's costly and I totally agree, however the API is also simple. It feels intuitive. But this library is not the place to make things feel good in expense of performance :)
I'll try to use Apply() instead of astrewrite for my next project. If there is anything I think it can be improved I'll comment here. Thanks again 馃憤
@griesemer it'd be handy if that code was go-gettable, so it is easier to use and evolve.
Since you wrote it, perhaps you want to put it up on github under your name? Or I could put it up somewhere for you; let me know.
I've pulled it out of package ast for you: https://gist.github.com/josharian/ff74a8451d7d4e7f062d7b9b04c87eac. Not tested yet, but it compiles, and the only modification was adding ast. in a bunch of places. :)
@josharian Since it's basically there (but for tests), I made this a 1.9 item. It should just be in the ast package. We can put it in once the tree opens (which is hopefully soon).
I'm struggling with deleting elements of an AST.
Sample:
func f() {
a1()
a2()
b()
}
Task: eliminate all statements in f that are calls to functions starting with a.
I did the obvious thing and wrote a Delete helper, analogous to SetField (not thoroughly tested yet):
// Delete deletes or sets to nil the named field from the parent node.
// Delete performs the following assignment:
//
// if index < 0 {
// parent.name = nil
// } else {
// copy(parent.name[index:], parent.name[index+1:])
// parent.name[len(parent.name)-1] = nil
// parent.name = parent.name[:len(parent.name)-1]
// }
//
// The parent node may be a pointer to the struct containing the named
// field, or it may be the struct itself.
//
// TODO: do parents of Package type need special handling?
func Delete(parent ast.Node, name string, index int) {
v := reflect.Indirect(reflect.ValueOf(parent)).FieldByName(name)
if index < 0 {
v.Set(reflect.Zero(v.Type()))
return
}
last := v.Len()
reflect.Copy(v.Slice(index, last), v.Slice(index+1, last))
v.Index(last - 1).Set(reflect.Zero(v.Type().Elem()))
v.SetLen(last - 1)
}
But when you delete an element from a slice, it disrupts the walk that Apply does. In the sample above, the obvious implementation of pre deletes a1() but misses a2(), because after the deletion of a1(), Apply's index into the BlockStmt is off by one.
One option is give ApplyFunc access to Apply's internal state, perhaps something like this:
type ApplyLocation struct {
name string // export?
index int // export?
parent Node // export?
a *application
}
type ApplyFunc func(loc ApplyLocation, n Node) bool
func (x ApplyLocation) Set(n ast.Node)
func (x ApplyLocation) Delete() // adjusts application's internal state as necessary
@josharian Another option might be to buffer the deletes and apply them after the walk - though that may not be desirable in certain situations where we want the delete to take effect during the walk.
The API will need to describe what guarantees it is going to provide (we have similar problems with deletes of map entries during map iteration).
I've also been thinking that perhaps the description or "cursor" of the entry in the parent (name, index) may have to be opaque so we have more freedom in the implementation.
Thanks for pointing this out. I think this needs a good solution before we should add Apply to the ast package. (Though I don't have time to focus on this at the moment).
I had a somewhat similar problem in a tree transducer I was toying around with recently. I had my equivalent of the apply function have to return a special flag saying "delete the node I was just given" if it wanted to remove instead of alter the node. I don't know how well that would work in this situation, but it worked for me.
I prototyped the "use an opaque cursor for everything" approach. I'll keep using it for my work and see whether further API issues arise. Input welcome.
Here's the godoc output for that gist:
PACKAGE DOCUMENTATION
package apply
import "."
FUNCTIONS
func Apply(root ast.Node, pre, post ApplyFunc) ast.Node
Apply traverses a syntax tree recursively, starting with root, and
calling pre and post for each node as described below. The result is the
(possibly modified) syntax tree.
If pre is not nil, it is called for each node before its children are
traversed (pre-order). If the result of calling pre is false, no
children are traversed, and post is not called for that node.
If post is not nil, it is called for each node after its children were
traversed (post-order). If the result of calling post is false,
traversal is terminated and Apply returns immediately.
Only fields that refer to AST nodes are considered children. Children
are traversed in the order in which they appear in the respective node's
struct definition.
TYPES
type ApplyCursor struct {
// contains filtered or unexported fields
}
An ApplyCursor describes a node encountered during Apply. Information
about the node and its parent is available via the Node, Parent, Name,
and Index methods.
Roughly speaking, the following invariants hold:
Parent().Name() == Node() if !HasIndex()
Parent().Name()[Index()] == Node() if HasIndex()
The methods Replace, Delete, InsertBefore, and InsertAfter can be used
to change the AST without disrupting Apply.
func (c ApplyCursor) Delete()
Delete deletes the current Node from its containing slice. If the
current Node is not part of a slice, Delete panics.
func (c ApplyCursor) HasIndex() bool
HasIndex reports whether the current Node is part of a slice of Nodes.
func (c ApplyCursor) Index() int
Index reports the index of the current Node in the slice of Nodes that
contains it. Index panics if the current Node is not part of a slice.
func (c ApplyCursor) InsertAfter(n ast.Node)
InsertAfter inserts n after the current Node in its containing slice. If
the current Node is not part of a slice, InsertAfter panics. Apply will
walk n.
func (c ApplyCursor) InsertBefore(n ast.Node)
InsertBefore inserts n before the current Node in its containing slice.
If the current Node is not part of a slice, InsertBefore panics. Apply
will not walk n.
func (c ApplyCursor) IsFile() bool
IsFile reports whether the current Node is a *File that is part of a
*Package map of *Files.
func (c ApplyCursor) Name() string
Name returns the name of the parent Node field that contains the current
Node. If the parent is a Package and the current Node is a File, it
returns the filename for the current Node.
func (c ApplyCursor) Node() ast.Node
Node returns the current Node.
func (c ApplyCursor) Parent() ast.Node
Parent returns the parent of the current Node.
func (c ApplyCursor) Replace(n ast.Node)
Replace replaces the current Node with n. The replacement node is not
walked by Apply.
type ApplyFunc func(cursor ApplyCursor) bool
An ApplyFunc is invoked by Apply for each node n, even if n is nil,
before and/or after the node's children.
The return value of ApplyFunc controls the syntax tree traversal. See
Apply for details.
@josharian I like your API but the one thing that concerns me is that an ApplyCursor needs to be set up for each node even if it's not needed (which I suspect is 99% of the time because most often an Apply is looking for one specific node only). I wonder if this approach can be made slightly more light-weight w/o losing too much of it's power.
Indeed. Although maybe if we documented that the lifetime of an ApplyCursor is a single call to pre/post, then we could allocate a single ApplyCursor and reuse it. The non-allocation part of the setup of the ApplyCursor should be pretty cheap, I think.
One other thing I learned while using the API above is that frequently I don't want to walk the node inserted by InsertAfter. We might want to change the behavior to non-walk (since the user can manually walk it if desired) or add a bool parameter to control the behavior.
@josharian Yes, if you pass in the ApplyCursor info down it's like passing the apply parameters via the ApplyCursor struct, and it's only allocated once. I like that. Care making this an official CL as solution for this issue (assign to me for review)?
Will do; expect it to take a few days.
Thanks - no rush.
(still working on this, not forgotten)
Change https://golang.org/cl/55790 mentions this issue: go/ast: add Apply, for rewriting ASTs
We don't use the issue tracker for questions; in the future please ask one the mailing lists or chat boards.
The go/ast documentation groups the nodes in Decl, Expr, and Stmt, and the grouping is pretty clearly telling you which node fits where. It's also trivial to find out empirically: if you can assign a node x to an ast.Stmt, then x is a statement, etc.
@griesemer sorry. I deleted the question. will try to find help on the boards
Is there a reason why the proposed ast.Apply returns a (possibly) modified ast.Node? This seems like a departure from ast.Walk, and I don't understand why it is necessary to return a Node instead of modifying it in place.
@smasher164 Because ast.Apply may change the type of the node which cannot be done by modifying a node in place. For instance, an ast.Apply working on an constant expression tree may replace that tree with a single node which is the constant result.
Change https://golang.org/cl/74930 mentions this issue: go/ast: add Apply, for rewriting ASTs
Change https://golang.org/cl/77811 mentions this issue: go/ast/astutil: add Apply, for rewriting Go ASTs
Most helpful comment
@fatih Thanks for looking into this. A few observations:
1) When you use
Apply, you don't need to type-assert on the parent - you can simply use the newSetFieldfunction (also documented in the API):ast.SetField(parent, name, index, <newnode>). With that, both versions will be approximately the same amount of code to write.2) In this specific case where you simply change a string, there's no need to use a complicated rewrite in the first place: You could just walk the tree and change that string directly.
3) Your implementation of
astrewriteis easier to use in simple cases, but it is almost always more costly: The implementation ofastrewritealways needs to write back the result returned byrewriteFunc, and because it's always anast.Nodethere's always a type assertion needed for the assignment to work (in theastrewriteimplementation) - that's relatively costly. You have to always do this for every single node even though most nodes are not rewritten. On the other hand, withast.Apply, theast.SetFieldoperation is relatively expensive but you only pay when you actually rewrite something.4) Finally, with
ast.Applyyou have the choice to do the rewriting in pre- or post-order when traversing the tree, which is often important to be able to control - this is not possibly withastrewrite.To summarize: If you only want to change the name of things, you may not need a rewrite at all. Secondly, while
astrewritemay be fine for your (more general) use case, it's probably not sufficiently powerful as a general rewrite mechanism, and likely quite a bit slower than the suggestedast.Apply. Hope that helps.Anyway, thanks again for investigating!