New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ast: Use enum dispatch for tree traversal #10382
base: master
Are you sure you want to change the base?
Conversation
@@ -2457,7 +1921,7 @@ pub struct Extras { | |||
pub struct Ast { | |||
// The top node. | |||
// Its type depends on what was requested to parse. | |||
top: Box<dyn NodeMut>, | |||
top: Box<NodeEnum>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this still need to be boxed?
I guess we still need a stable address for parent
but we should probably be using Pin
for that.
In doubt we can leave it until we pin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good eye. I'm pretty sure top level will still need a Box
, so that moving an Ast
doesn't invalidate the first level descendants. None of the other boxes are necessary, but I'm trying to limit the scope of each of these PRs.
I believe a fully idiomatic implementation would use Pin<Box<T>>
for the root and Option<Pin<NonNull<T>>>
for the children, as well as a PhantomPinned
marker field in every node that has children.
However, Pin
is mostly a guardrail. I think we already establish the necessary guarantees by:
- not exposing
&mut
access once parsing is complete - not populating parent pointers until the tree is constructed
I plan on looking into Pin
, but it may or may not prove cumbersome to work with here.
ListRefMut::AndorJobList(l) => self.populate_list(l, false), | ||
ListRefMut::ArgumentList(l) => self.populate_list(l, false), | ||
ListRefMut::ArgumentOrRedirectionList(l) => self.populate_list(l, false), | ||
ListRefMut::CaseItemList(l) => self.populate_list(l, false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should define a generic type for lists, they're all the same.
We used to have that in C++ but I guess it goes against the exhaustive-enum-matching approach, so probably not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already have plans that will make this entire function go away. Specifically:
trait Parse {
fn parse(pop: &mut Populator<'_>) -> Self;
}
Leaves, branches, lists, and variants(?) will all have their own implementations of this trait, using logic relocated from impl Populator
.
Branch impls will basically do this:
Self {
parent: None,
kw_else: Parse::parse(pop),
semi_nl: Parse::parse(pop),
body: Parse::parse(pop),
}
There will still be special handling for the "missing end
" case that VisitResult
handles, but it will look quite different.
Maybe towards the end of these changes, parsing could even use Result
and the ?
operator. Then Parse
impls for branches with a KeywordEnd
would just need to attach context from their level of the tree before propagating.
@@ -3891,11 +3298,11 @@ | |||
if top_type == Type::job_list { | |||
let mut list = pops.allocate::<JobList>(); | |||
pops.populate_list(&mut *list, true /* exhaust_stream */); | |||
ast.top = list; | |||
ast.top = Box::new(NodeEnum::from(*list)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the only place where we really need the generic From
impl is in the as_node()
and as_mut_node()
.
If possible we should not use it in cases like this where we know the concrete type,
because that gives more information. So I'd write it like
ast.top = Box::new(NodeEnum::Branch(BranchEnum::FreestandingArgumentList(
*list,
)));
and so on.
Which begs the question whether we actually need a From impl?
Is there a better way to implement as_node()
and as_mut_node()
?
In a way that prevents us from accidentally using NodeEnum::from like this..
I don't have a concrete suggestion so it's fine to leave it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ties into my third future-work bullet point: Ast
eventually shouldn't even contain a general NodeEnum
, it should be an AstRoot
enum, or possibly even a generic.
@@ -3891,11 +3298,11 @@ | |||
if top_type == Type::job_list { | |||
let mut list = pops.allocate::<JobList>(); | |||
pops.populate_list(&mut *list, true /* exhaust_stream */); | |||
ast.top = list; | |||
ast.top = Box::new(NodeEnum::from(*list)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to avoid introducing custom proc macros, but a mysterious hygiene
bug forced my hand.
I get error[E0412]: cannot find type
BranchEnum in this scope
Can't that be fixed?
Plus, it's arguably more readable than the declarative version, and the
blanket_ref attribute is a nice bonus (those blanket impls are necessary
for enum_dispatch to do its thing on the ref enums)
I wonder how much we'd miss if we implemented enum_dispatch ourselves.
I have a preference for avoiding proc macros -- it should be possible
to implement a light version of enum_dipatch with normal macros,
maybe not match statements but if-else chains -- but I agree the
enum
syntax works great. So this is probably the best way forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get "error[E0412]: cannot find type
BranchEnum
in this scope"
Not sure what you mean by that. On 1.77.0, the MRE I linked produces "cannot find value inner
in this scope".
xiretza found this old PR that allegedly fixeds the problem.
I wonder how much we'd miss if we implemented enum_dispatch ourselves.
The problem is that the macro needs to be aware of the trait's items, which is possible but annoying with a declarative macro.
If we used a declarative macro to define the enums, it would require the paste
crate, and when I tried using that here, the readability was terrible.
There is a possible future where the design simplifies to the point that we can just have the enums defined without a macro, but we need to get rid of the Enum
and RefMut
flavors first to avoid descending (deeper?) into boilerplate hell.
macros/src/ast_node.rs
Outdated
enum Flavor { | ||
Val, | ||
Ref, | ||
Mut, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this usually called RefMut? Maybe use Value
instead of Val
then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Flavor
enum's days are numbered anyway: we only use NodeEnum
for the AST root, and we only use NodeRefMut
for parsing. In both cases, it's private to the ast
module and I've already described a refactor that would eliminate the enum.
/// | ||
/// Output: | ||
/// ``` | ||
/// { T::bar(self, baz) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the input/output comments are helpful, maybe add them elsewhere too?
Maybe a high level one for blanket_ref, etc.
Perhaps use "realistic" names that sound like real AST nodes, so it's easier to categorize
AFAICT, T is the concrete node here so we implement all traits for &IfStatement
and &mut IfStatement
.
The mutable variant is questionable because we never need that outside the populator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, the two traits we actually need the #[blanket_mut]
attribute for are quite simple.
However, to implement that, we still need e.g., impl<T: Node> Node for &mut T
, which isn't so trivial.
As long as we need to keep the mutable flavor around, we might as well have the attribute for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc comment added to top level.
use std::collections::HashMap; | ||
use syn::{parse_quote, punctuated::Punctuated, Fields, Ident, ItemEnum, Token}; | ||
|
||
pub struct State<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put the relevant PR description into the commit message, to help git blame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I don't understand what you mean by this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's often a good idea to explain why some decisions where made,
the traditional place to put such explanation is the commit message.
For example it might be good explain why blanket_ref
and NodeRef
exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I've pushed some additional commits since your initial review, should I do this as part of a second squash commit or what?
I think this is probably fine to merge with little modification but we should look into whether our trait hierarchy is still necessary, given that it overlaps with the methods added by enum_dispatch. |
I'd like to understand better why we need That being said I'm not sure when I'll get around to it and I'm happy to delegate to you. |
prefer rebase on master instead of merging for this type of change. Merge commits are a complex thing |
Description
A lot of the complexity here is a result of needing separate reference-flavored enums, because things that used to take
&dyn Node
still need an "abstractly-typed reference" even though the node's owner knows exactly what type it is.I tried to avoid introducing custom proc macros, but a mysterious hygiene bug forced my hand. Plus, it's arguably more readable than the declarative version, and the
blanket_ref
attribute is a nice bonus (those blanket impls are necessary for enum_dispatch to do its thing on the ref enums)Future work:
NonNull
and possiblyPin
fn(&mut Populator, &mut NodeEnum)
pattern in favor offn<N: Node>(&mut Populator) -> N
, which can go in a trait that each node implements (taking the place ofAcceptorMut
).Mut
flavor of node enum.top
field of theAst
struct is only expected to be two kinds of node, there should be an enum for that.parent
method, but wouldn't rely on it as much.Variant
) as previously discussed. I don't fully understand their nature yet, but I can tell there's some redundancy.Would they work similarly to tokens and keywords?TODOs:
No intended user-facing changes. Performance characteristics unknown.