ast: Use enum dispatch for tree traversal #10382

The0x539 · 2024-03-19T00:47:35Z

Description

A lot of the complexity here is a result of needing separate reference-flavored enums, because things that used to take &dyn Node still need an "abstractly-typed reference" even though the node's owner knows exactly what type it is.

I tried to avoid introducing custom proc macros, but a mysterious hygiene bug forced my hand. Plus, it's arguably more readable than the declarative version, and the blanket_ref attribute is a nice bonus (those blanket impls are necessary for enum_dispatch to do its thing on the ref enums)

Future work:

Change parent pointers to use NonNull and possibly Pin
A derive macro for AST nodes
Eliminate the fn(&mut Populator, &mut NodeEnum) pattern in favor of fn<N: Node>(&mut Populator) -> N, which can go in a trait that each node implements (taking the place of AcceptorMut).
- This should also eliminate the need for the Mut flavor of node enum.
Rearrange the enums a bit to make some invalid states unrepresentable. Major examples:
- If the top field of the Ast struct is only expected to be two kinds of node, there should be an enum for that.
- If a node can only have a specific type as a parent, then that can be encoded in the type of its parent pointer. We'd probably still have the current unknown-type parent method, but wouldn't rely on it as much.
Do something about the "union nodes" (enums with names ending in Variant) as previously discussed. I don't fully understand their nature yet, but I can tell there's some redundancy. ~~Would they work similarly to tokens and keywords?~~

TODOs:

No intended user-facing changes. Performance characteristics unknown.

macros/src/ast_node.rs

krobelus · 2024-03-23T12:16:45Z

src/ast.rs

@@ -2457,7 +1921,7 @@ pub struct Extras {
 pub struct Ast {
 // The top node.
 // Its type depends on what was requested to parse.
- top: Box<dyn NodeMut>,
+ top: Box<NodeEnum>,


Does this still need to be boxed?
I guess we still need a stable address for parent but we should probably be using Pin for that.
In doubt we can leave it until we pin

Good eye. I'm pretty sure top level will still need a Box, so that moving an Ast doesn't invalidate the first level descendants. None of the other boxes are necessary, but I'm trying to limit the scope of each of these PRs.

I believe a fully idiomatic implementation would use Pin<Box<T>> for the root and Option<Pin<NonNull<T>>> for the children, as well as a PhantomPinned marker field in every node that has children.

However, Pin is mostly a guardrail. I think we already establish the necessary guarantees by:

not exposing &mut access once parsing is complete

not populating parent pointers until the tree is constructed

I plan on looking into Pin, but it may or may not prove cumbersome to work with here.

krobelus · 2024-03-23T12:16:45Z

src/ast.rs

+ ListRefMut::AndorJobList(l) => self.populate_list(l, false),
+ ListRefMut::ArgumentList(l) => self.populate_list(l, false),
+ ListRefMut::ArgumentOrRedirectionList(l) => self.populate_list(l, false),
+ ListRefMut::CaseItemList(l) => self.populate_list(l, false),


I wonder if we should define a generic type for lists, they're all the same.
We used to have that in C++ but I guess it goes against the exhaustive-enum-matching approach, so probably not.

I already have plans that will make this entire function go away. Specifically:

trait Parse { fn parse(pop: &mut Populator<'_>) -> Self; }

Leaves, branches, lists, and variants(?) will all have their own implementations of this trait, using logic relocated from impl Populator.
Branch impls will basically do this:

Self { parent: None, kw_else: Parse::parse(pop), semi_nl: Parse::parse(pop), body: Parse::parse(pop), }

There will still be special handling for the "missing end" case that VisitResult handles, but it will look quite different.
Maybe towards the end of these changes, parsing could even use Result and the ? operator. Then Parse impls for branches with a KeywordEnd would just need to attach context from their level of the tree before propagating.

krobelus · 2024-03-23T12:16:45Z

src/ast.rs

@@ -3891,11 +3298,11 @@
 if top_type == Type::job_list {
 let mut list = pops.allocate::<JobList>();
 pops.populate_list(&mut *list, true /* exhaust_stream */);
- ast.top = list;
+ ast.top = Box::new(NodeEnum::from(*list));


I believe the only place where we really need the generic From impl is in the as_node() and as_mut_node().

If possible we should not use it in cases like this where we know the concrete type,
because that gives more information. So I'd write it like

ast.top = Box::new(NodeEnum::Branch(BranchEnum::FreestandingArgumentList( *list, )));

and so on.

Which begs the question whether we actually need a From impl?
Is there a better way to implement as_node() and as_mut_node()?
In a way that prevents us from accidentally using NodeEnum::from like this..
I don't have a concrete suggestion so it's fine to leave it.

This ties into my third future-work bullet point: Ast eventually shouldn't even contain a general NodeEnum, it should be an AstRoot enum, or possibly even a generic.

krobelus · 2024-03-23T12:16:46Z

src/ast.rs

@@ -3891,11 +3298,11 @@
 if top_type == Type::job_list {
 let mut list = pops.allocate::<JobList>();
 pops.populate_list(&mut *list, true /* exhaust_stream */);
- ast.top = list;
+ ast.top = Box::new(NodeEnum::from(*list));


I tried to avoid introducing custom proc macros, but a mysterious hygiene
bug forced my hand.

I get error[E0412]: cannot find type BranchEnum in this scope
Can't that be fixed?

Plus, it's arguably more readable than the declarative version, and the
blanket_ref attribute is a nice bonus (those blanket impls are necessary
for enum_dispatch to do its thing on the ref enums)

I wonder how much we'd miss if we implemented enum_dispatch ourselves.
I have a preference for avoiding proc macros -- it should be possible
to implement a light version of enum_dipatch with normal macros,
maybe not match statements but if-else chains -- but I agree the
enum syntax works great. So this is probably the best way forward.

I get "error[E0412]: cannot find type BranchEnum in this scope"

Not sure what you mean by that. On 1.77.0, the MRE I linked produces "cannot find value inner in this scope".
xiretza found this old PR that allegedly fixeds the problem.

I wonder how much we'd miss if we implemented enum_dispatch ourselves.

The problem is that the macro needs to be aware of the trait's items, which is possible but annoying with a declarative macro.

If we used a declarative macro to define the enums, it would require the paste crate, and when I tried using that here, the readability was terrible.

There is a possible future where the design simplifies to the point that we can just have the enums defined without a macro, but we need to get rid of the Enum and RefMut flavors first to avoid descending (deeper?) into boilerplate hell.

src/ast.rs

macros/src/ast_node.rs

krobelus · 2024-03-23T12:16:46Z

macros/src/ast_node.rs

+enum Flavor {
+ Val,
+ Ref,
+ Mut,


isn't this usually called RefMut? Maybe use Value instead of Val then.

The Flavor enum's days are numbered anyway: we only use NodeEnum for the AST root, and we only use NodeRefMut for parsing. In both cases, it's private to the ast module and I've already described a refactor that would eliminate the enum.

krobelus · 2024-03-23T12:16:46Z

macros/src/blanket_ref_impl.rs

+///
+/// Output:
+/// ```
+/// { T::bar(self, baz) }


the input/output comments are helpful, maybe add them elsewhere too?
Maybe a high level one for blanket_ref, etc.
Perhaps use "realistic" names that sound like real AST nodes, so it's easier to categorize

AFAICT, T is the concrete node here so we implement all traits for &IfStatement and &mut IfStatement.
The mutable variant is questionable because we never need that outside the populator.

True, the two traits we actually need the #[blanket_mut] attribute for are quite simple.
However, to implement that, we still need e.g., impl<T: Node> Node for &mut T, which isn't so trivial.
As long as we need to keep the mutable flavor around, we might as well have the attribute for it.

Doc comment added to top level.

krobelus · 2024-03-23T12:16:47Z

macros/src/ast_node.rs

+use std::collections::HashMap;
+use syn::{parse_quote, punctuated::Punctuated, Fields, Ident, ItemEnum, Token};
+
+pub struct State<'a> {


can you put the relevant PR description into the commit message, to help git blame

I'm sorry, I don't understand what you mean by this.

It's often a good idea to explain why some decisions where made,
the traditional place to put such explanation is the commit message.
For example it might be good explain why blanket_ref and NodeRef exist.

See also https://github.com/git/git/blob/79bdd48716a4c455bdc8ffd91d57a18d5cd55baa/Documentation/SubmittingPatches#L110-L225

Since I've pushed some additional commits since your initial review, should I do this as part of a second squash commit or what?

krobelus · 2024-03-23T12:18:09Z

I think this is probably fine to merge with little modification but we should look into whether our trait hierarchy is still necessary, given that it overlaps with the methods added by enum_dispatch.

krobelus · 2024-03-29T11:51:10Z

I'd like to understand better why we need NodeRef, can we not pass a normal reference to a node enum?
Also I'd like to evaluate the impact of proc macros (on compile time), and how bad it would be to replace them with plain macros.
Finally, I'd get rid of all Node sub-traits. Also, if we get rid of parent nodes (which seems doable in theory, we could also get rid of Node.

That being said I'm not sure when I'll get around to it and I'm happy to delegate to you.
If you already have follow-up work, feel free to add it here or elsewhere;
I don't think you're blocked by this not merging yet, there shouldn't be an nontrivial conflicts.

krobelus · 2024-04-12T10:49:24Z

prefer rebase on master instead of merging for this type of change. Merge commits are a complex thing

ast: Use enum dispatch for tree traversal

996efdd

krobelus self-requested a review March 20, 2024 20:00

krobelus reviewed Mar 23, 2024

View reviewed changes

The0x539 added 7 commits March 23, 2024 11:13

Fix visit_semi_nl function signatures

6ff6007

Extend CMakeLists Cargo workaround

a4bcdf1

Tweak NodeRef/&/impl Node function signatures

5d9659c

Tweak CMakeLists Cargo workaround

6cf1b0e

parse_util: Clean up some detect_errors_in functions

d59e85a

Move Flavor type definition to top

6e4f4d8

macros: Add documentation to blanket_ref

3a46f41

Merge remote-tracking branch 'upstream/master' into enum-ast-squashed

bc64183

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ast: Use enum dispatch for tree traversal #10382

ast: Use enum dispatch for tree traversal #10382

The0x539 commented Mar 19, 2024 •

edited

krobelus Mar 23, 2024

The0x539 Mar 23, 2024

krobelus Mar 23, 2024

The0x539 Mar 23, 2024

krobelus Mar 23, 2024

The0x539 Mar 23, 2024

krobelus Mar 23, 2024

The0x539 Mar 23, 2024 •

edited

krobelus Mar 23, 2024

The0x539 Mar 23, 2024 •

edited

krobelus Mar 23, 2024

The0x539 Mar 23, 2024

The0x539 Mar 23, 2024

krobelus Mar 23, 2024

The0x539 Mar 23, 2024

krobelus Mar 23, 2024

The0x539 Mar 23, 2024

krobelus commented Mar 23, 2024

krobelus commented Mar 29, 2024

krobelus commented Apr 12, 2024

ast: Use enum dispatch for tree traversal #10382

Are you sure you want to change the base?

ast: Use enum dispatch for tree traversal #10382

Conversation

The0x539 commented Mar 19, 2024 • edited

Description

TODOs:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

The0x539 Mar 23, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

The0x539 Mar 23, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krobelus commented Mar 23, 2024

krobelus commented Mar 29, 2024

krobelus commented Apr 12, 2024

The0x539 commented Mar 19, 2024 •

edited

The0x539 Mar 23, 2024 •

edited

The0x539 Mar 23, 2024 •

edited