Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ast: Use enum dispatch for tree traversal #10382

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

The0x539
Copy link
Contributor

@The0x539 The0x539 commented Mar 19, 2024

Description

A lot of the complexity here is a result of needing separate reference-flavored enums, because things that used to take &dyn Node still need an "abstractly-typed reference" even though the node's owner knows exactly what type it is.

I tried to avoid introducing custom proc macros, but a mysterious hygiene bug forced my hand. Plus, it's arguably more readable than the declarative version, and the blanket_ref attribute is a nice bonus (those blanket impls are necessary for enum_dispatch to do its thing on the ref enums)

Future work:

  • Change parent pointers to use NonNull and possibly Pin
  • A derive macro for AST nodes
  • Eliminate the fn(&mut Populator, &mut NodeEnum) pattern in favor of fn<N: Node>(&mut Populator) -> N, which can go in a trait that each node implements (taking the place of AcceptorMut).
    • This should also eliminate the need for the Mut flavor of node enum.
  • Rearrange the enums a bit to make some invalid states unrepresentable. Major examples:
    • If the top field of the Ast struct is only expected to be two kinds of node, there should be an enum for that.
    • If a node can only have a specific type as a parent, then that can be encoded in the type of its parent pointer. We'd probably still have the current unknown-type parent method, but wouldn't rely on it as much.
  • Do something about the "union nodes" (enums with names ending in Variant) as previously discussed. I don't fully understand their nature yet, but I can tell there's some redundancy. Would they work similarly to tokens and keywords?

TODOs:

No intended user-facing changes. Performance characteristics unknown.

@krobelus krobelus self-requested a review March 20, 2024 20:00
macros/src/ast_node.rs Show resolved Hide resolved
@@ -2457,7 +1921,7 @@ pub struct Extras {
pub struct Ast {
// The top node.
// Its type depends on what was requested to parse.
top: Box<dyn NodeMut>,
top: Box<NodeEnum>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still need to be boxed?
I guess we still need a stable address for parent but we should probably be using Pin for that.
In doubt we can leave it until we pin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good eye. I'm pretty sure top level will still need a Box, so that moving an Ast doesn't invalidate the first level descendants. None of the other boxes are necessary, but I'm trying to limit the scope of each of these PRs.

I believe a fully idiomatic implementation would use Pin<Box<T>> for the root and Option<Pin<NonNull<T>>> for the children, as well as a PhantomPinned marker field in every node that has children.

However, Pin is mostly a guardrail. I think we already establish the necessary guarantees by:

  • not exposing &mut access once parsing is complete
  • not populating parent pointers until the tree is constructed

I plan on looking into Pin, but it may or may not prove cumbersome to work with here.

ListRefMut::AndorJobList(l) => self.populate_list(l, false),
ListRefMut::ArgumentList(l) => self.populate_list(l, false),
ListRefMut::ArgumentOrRedirectionList(l) => self.populate_list(l, false),
ListRefMut::CaseItemList(l) => self.populate_list(l, false),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should define a generic type for lists, they're all the same.
We used to have that in C++ but I guess it goes against the exhaustive-enum-matching approach, so probably not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already have plans that will make this entire function go away. Specifically:

trait Parse {
    fn parse(pop: &mut Populator<'_>) -> Self;
}

Leaves, branches, lists, and variants(?) will all have their own implementations of this trait, using logic relocated from impl Populator.
Branch impls will basically do this:

Self {
    parent: None,
    kw_else: Parse::parse(pop),
    semi_nl: Parse::parse(pop),
    body: Parse::parse(pop),
}

There will still be special handling for the "missing end" case that VisitResult handles, but it will look quite different.
Maybe towards the end of these changes, parsing could even use Result and the ? operator. Then Parse impls for branches with a KeywordEnd would just need to attach context from their level of the tree before propagating.

@@ -3891,11 +3298,11 @@
if top_type == Type::job_list {
let mut list = pops.allocate::<JobList>();
pops.populate_list(&mut *list, true /* exhaust_stream */);
ast.top = list;
ast.top = Box::new(NodeEnum::from(*list));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the only place where we really need the generic From impl is in the as_node() and as_mut_node().

If possible we should not use it in cases like this where we know the concrete type,
because that gives more information. So I'd write it like

ast.top = Box::new(NodeEnum::Branch(BranchEnum::FreestandingArgumentList(
	*list,
)));

and so on.

Which begs the question whether we actually need a From impl?
Is there a better way to implement as_node() and as_mut_node()?
In a way that prevents us from accidentally using NodeEnum::from like this..
I don't have a concrete suggestion so it's fine to leave it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ties into my third future-work bullet point: Ast eventually shouldn't even contain a general NodeEnum, it should be an AstRoot enum, or possibly even a generic.

@@ -3891,11 +3298,11 @@
if top_type == Type::job_list {
let mut list = pops.allocate::<JobList>();
pops.populate_list(&mut *list, true /* exhaust_stream */);
ast.top = list;
ast.top = Box::new(NodeEnum::from(*list));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to avoid introducing custom proc macros, but a mysterious hygiene
bug forced my hand.

I get error[E0412]: cannot find type BranchEnum in this scope
Can't that be fixed?

Plus, it's arguably more readable than the declarative version, and the
blanket_ref attribute is a nice bonus (those blanket impls are necessary
for enum_dispatch to do its thing on the ref enums)

I wonder how much we'd miss if we implemented enum_dispatch ourselves.
I have a preference for avoiding proc macros -- it should be possible
to implement a light version of enum_dipatch with normal macros,
maybe not match statements but if-else chains -- but I agree the
enum syntax works great. So this is probably the best way forward.

Copy link
Contributor Author

@The0x539 The0x539 Mar 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get "error[E0412]: cannot find type BranchEnum in this scope"

Not sure what you mean by that. On 1.77.0, the MRE I linked produces "cannot find value inner in this scope".
xiretza found this old PR that allegedly fixeds the problem.

I wonder how much we'd miss if we implemented enum_dispatch ourselves.

The problem is that the macro needs to be aware of the trait's items, which is possible but annoying with a declarative macro.

If we used a declarative macro to define the enums, it would require the paste crate, and when I tried using that here, the readability was terrible.

There is a possible future where the design simplifies to the point that we can just have the enums defined without a macro, but we need to get rid of the Enum and RefMut flavors first to avoid descending (deeper?) into boilerplate hell.

src/ast.rs Show resolved Hide resolved
macros/src/ast_node.rs Show resolved Hide resolved
enum Flavor {
Val,
Ref,
Mut,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this usually called RefMut? Maybe use Value instead of Val then.

Copy link
Contributor Author

@The0x539 The0x539 Mar 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Flavor enum's days are numbered anyway: we only use NodeEnum for the AST root, and we only use NodeRefMut for parsing. In both cases, it's private to the ast module and I've already described a refactor that would eliminate the enum.

///
/// Output:
/// ```
/// { T::bar(self, baz) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the input/output comments are helpful, maybe add them elsewhere too?
Maybe a high level one for blanket_ref, etc.
Perhaps use "realistic" names that sound like real AST nodes, so it's easier to categorize

AFAICT, T is the concrete node here so we implement all traits for &IfStatement and &mut IfStatement.
The mutable variant is questionable because we never need that outside the populator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, the two traits we actually need the #[blanket_mut] attribute for are quite simple.
However, to implement that, we still need e.g., impl<T: Node> Node for &mut T, which isn't so trivial.
As long as we need to keep the mutable flavor around, we might as well have the attribute for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc comment added to top level.

use std::collections::HashMap;
use syn::{parse_quote, punctuated::Punctuated, Fields, Ident, ItemEnum, Token};

pub struct State<'a> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put the relevant PR description into the commit message, to help git blame

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I don't understand what you mean by this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's often a good idea to explain why some decisions where made,
the traditional place to put such explanation is the commit message.
For example it might be good explain why blanket_ref and NodeRef exist.

See also https://github.com/git/git/blob/79bdd48716a4c455bdc8ffd91d57a18d5cd55baa/Documentation/SubmittingPatches#L110-L225

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I've pushed some additional commits since your initial review, should I do this as part of a second squash commit or what?

@krobelus
Copy link
Member

I think this is probably fine to merge with little modification but we should look into whether our trait hierarchy is still necessary, given that it overlaps with the methods added by enum_dispatch.

@krobelus
Copy link
Member

I'd like to understand better why we need NodeRef, can we not pass a normal reference to a node enum?
Also I'd like to evaluate the impact of proc macros (on compile time), and how bad it would be to replace them with plain macros.
Finally, I'd get rid of all Node sub-traits. Also, if we get rid of parent nodes (which seems doable in theory, we could also get rid of Node.

That being said I'm not sure when I'll get around to it and I'm happy to delegate to you.
If you already have follow-up work, feel free to add it here or elsewhere;
I don't think you're blocked by this not merging yet, there shouldn't be an nontrivial conflicts.

@krobelus
Copy link
Member

prefer rebase on master instead of merging for this type of change. Merge commits are a complex thing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants