Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: AST Traverse #2987

Draft
wants to merge 27 commits into
base: main
Choose a base branch
from
Draft

WIP: AST Traverse #2987

wants to merge 27 commits into from

Conversation

rzvxa
Copy link
Collaborator

@rzvxa rzvxa commented Apr 15, 2024

No description provided.

@github-actions github-actions bot added the A-ast Area - AST label Apr 15, 2024
Copy link
Collaborator Author

rzvxa commented Apr 15, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @rzvxa and the rest of your teammates on Graphite Graphite

Copy link

codspeed-hq bot commented Apr 15, 2024

CodSpeed Performance Report

Merging #2987 will not alter performance

Comparing 04-15-wip_ast_traverse (fb7bce1) with main (d765076)

Summary

✅ 30 untouched benchmarks

@Boshen Boshen requested a review from milesj April 15, 2024 03:07
@github-actions github-actions bot added A-codegen Area - Code Generation A-prettier Area - Prettier labels Apr 15, 2024
@milesj
Copy link
Collaborator

milesj commented Apr 15, 2024

@rzvxa So how is this working exactly? Does #[ast_node] mutate the current AST node? Assumed we would create a duplicate node, like TraversableProgram.

@rzvxa
Copy link
Collaborator Author

rzvxa commented Apr 15, 2024

@rzvxa So how is this working exactly? Does #[ast_node] mutate the current AST node? Assumed we would create a duplicate node, like TraversableProgram.

It would modify and validate our ast nodes and generate their appropriate traversable versions, mutation wrappers, etc.
It will also ensure that the 2 types have the same representation and size; It adds some constant assertions to test it out.

@github-actions github-actions bot added A-parser Area - Parser A-semantic Area - Semantic A-minifier Area - Minifier A-transformer Area - Transformer / Transpiler A-linter Area - Linter labels Apr 17, 2024
@rzvxa rzvxa closed this Apr 18, 2024
@rzvxa rzvxa reopened this Apr 18, 2024
@rzvxa rzvxa force-pushed the 04-15-wip_ast_traverse branch 2 times, most recently from 5cf4899 to 9736d4d Compare April 18, 2024 02:33
@@ -58,15 +60,15 @@ fn size_asserts() {
use crate::ast;
use oxc_index::assert_eq_size;

assert_eq_size!(ast::Statement, [u8; 16]);
assert_eq_size!(ast::Statement, [u8; 24]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's causing these increases exactly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing the representation from rust to C

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are possible workarounds to reverse the type size inflation, if we want to prioritize it. It does affect the benchmarks a little (-1% on parser benchmarks), but to be honest I expected it to be much worse.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was also expecting a hit in the performance; Not having any tangible performance drop is a sentiment to how well modern CPUs are in pre-fetching and pipelining the data.

@overlookmotel
Copy link
Collaborator

I've started going through this.

So far just been tidying up a little. Have pushed a few commits to remove changes which aren't relevant to the thrust of this PR, and submitted a couple as separate PRs (#3026, #3027). Aim is to reduce the diff a little, and make it a bit easier to follow the substantive changes.

Will start going through the meat of it soon as I can.

@overlookmotel
Copy link
Collaborator

@rzvxa It's starting to come into focus for me now.

I attempted to make it work without changing the AST further, but unfortunately can't find a way to avoid soundness issues without all enum variants being boxed, and other changes (my failed attempt to get MemberExpression working here).

My conclusion is: If there's a visit_* function for a type, then that type needs either:

  1. to be an enum (which is Copy) or
  2. to be enclosed in Box<_> or Vec<_> everywhere it's used (so that Box<T>/Vec<T> can be safely transmuted to &GCell<T>, which is what's passed to the visit_* function).

Unfortunately, that will involve quite a lot of changes to the AST.

NB: A variant on my failed attempt above should work in some cases, as long as the type is always in a fixed position in all other types (i.e. not within Option<_>), so a &GCell<T> will always point to a valid T forever, no matter how the AST is altered. But it's tricky, so I think better just box everything for now to get it working, and then try to scale the boxing back again later if it hurts performance.

I've added a commit to use layout_inspect on the AST, which allows running queries on the AST types:

node tasks/inspect_ast

Do have a look if you're interested.

Clocking off for tonight now, but will continue tomorrow.

@overlookmotel overlookmotel force-pushed the 04-15-wip_ast_traverse branch 2 times, most recently from d2a9bab to c2480a5 Compare April 24, 2024 01:33
@overlookmotel
Copy link
Collaborator

PS: I also made some bug fixes and improvements to demo impl (the last 6 commits here: https://github.com/overlookmotel/ast-cell-test/commits/main/), which we should apply to this full implementation at some point.

@overlookmotel overlookmotel force-pushed the 04-15-wip_ast_traverse branch 2 times, most recently from a2cdd65 to 481b301 Compare April 25, 2024 15:46
@overlookmotel
Copy link
Collaborator

A lot of work went into this, and I would like to save it somewhere. It's possible we want to come back to this later.

I'll save it in a branch and then will close again.

todor-a pushed a commit to todor-a/oxc that referenced this pull request May 19, 2024
OK, this is a big one...

I have done this as part of work on Traversable AST, but I believe it
has wider benefits, so thought better to spin it off into its own PR.

## What this PR does

This PR squashes all nested AST enum types (oxc-project#2685).

e.g.: Previously:

```rs
pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>),
    /* ...other Statement variants... */
    Declaration(Declaration<'a>),
}

pub enum Declaration<'a> {
    VariableDeclaration(Box<'a, VariableDeclaration<'a>>),
    /* ...other Declaration variants... */
}
```

After this PR:

```rs
#[repr(C, u8)]
pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>) = 0,
    /* ...other Statement variants... */

    VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32,
    /* ...other Declaration variants... */
}

#[repr(C, u8)]
pub enum Declaration<'a> {
    VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32,
    /* ...other Declaration variants... */
}
```

All `Declaration`'s variants are combined into `Statement`, but
`Declaration` type still exists.

As both types are `#[repr(C, u8)]`, and the discriminants are aligned, a
`Declaration` can be transmuted to a `Statement` at zero cost.

This is the same thing as oxc-project#2847, but here applied to *all* nested enums
in the AST, and with improved helper methods.

No enums increase in size, and a few get smaller. Indirection is reduced
for some types (this removes multiple levels of boxing).

## Why?

1. It is a prerequisite for Traversable AST (oxc-project#2987).
2. It would help a lot with AST Transfer (oxc-project#2409) - it solves the only
remaining blocker for this.
3. It is a step closer to making the whole AST `#[repr(C)]`.

## Why is it a good thing for the AST to be `#[repr(C)]`?

Oxc's direction appears to be increasingly to build up control over the
fundamental primitives we use, in order to unlock performance and
features. We have our own allocator, our own custom implementations for
`Box` and `Vec`, our own `IndexVec` (TBC). The AST is the central
building block of Oxc, and taking control of its memory layout feels
like a step in this same direction.

Oxc has a major advantage over other similar libraries in that it keeps
all the AST data in an arena. This opens the door to treating the AST
either as Rust types or as *pure data* (just bytes). That data can be
moved around and manipulated beyond what Rust natively allows.

However, to enable that, the types need to be well-specified, with
completely stable layouts. `#[repr(C)]` is the only tool Rust provides
to do this.

Once the types are `#[repr(C)]`, various features become possible:

1. Cheap transfer of the AST across boundaries without ser/deser - the
property used by AST Transfer.
2. Having multiple versions of the AST (standard, read-only,
traversable), and these AST representations can be converted to one
other at zero cost via transmute - the property used by Traversable AST
scheme.
3. Caching AST data on disk (oxc-project#3079) or transferring across network.
4. Stuff we haven't thought of yet!

Allowing the AST to be treated as pure data will likely unlock other
"next level" features further down the track (caching for "edge
bundling" comes to mind).

## The problem with `#[repr(C)]`

It's not *required* to squash nested enums to make the AST `#[repr(C)]`.

But the problem with `#[repr(C)]` is that it disables some compiler
optimizations. Without `#[repr(C)]`, the compiler squashes enums itself
in some cases (which is how `Statement` is currently 16 bytes). But
making the types `#[repr(C)]` as they are currently disables this
optimization.

So this PR essentially makes explicit what the compiler is already doing
- and in fact goes a bit further with the optimization than the compiler
is able to, in squashing 3 or 4 layers of nested enums (the compiler
only does up to 2 layers).

## Implementation

One enum "inheriting" variants from another is implemented with
`inherit_variants!` macro.

```rs
inherit_variants! {
#[repr(C, u8)]
pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>),
    /* ...other Statement variants... */
    
    // `Declaration` variants added here by `inherit_variants!` macro
    @inherit Declaration
    // `ModuleDeclaration` variants added here by `inherit_variants!` macro
    @inherit ModuleDeclaration
}
}
```

The macro is *fairly* lightweight, and I think the above is quite easy
to understand. No proc macros.

The macro also implements utility methods for converting between enums
e.g. `Statement::as_declaration`. These methods are all zero-cost
(essentially transmutes).

New patterns for dealing with nested enums are introduced:

Creation:

```rs
// Old
let stmt = Statement::Declaration(Declaration::VariableDeclaration(var_decl));

// New
let stmt = Statement::VariableDeclaration(var_decl);
```

Conversion:

```rs
// Old
let stmt = Statement::Declaration(decl);

// New
let stmt = Statement::from(decl);
```

Testing:

```rs
// Old
if matches!(stmt, Statement::Declaration(_)) { }
if matches!(stmt, Statement::ModuleDeclaration(m) if m.is_import()) { }

// New
if stmt.is_declaration() { }
if matches!(stmt, Statement::ImportDeclaration(_)) { }
```

Branching:

```rs
// Old
if let Statement::Declaration(decl) = &stmt { decl.do_stuff() };

// New
if let Some(decl) = stmt.as_declaration() { decl.do_stuff() };
```

Matching:

```rs
// Old
match stmt {
    Statement::Declaration(decl) => visitor.visit(decl),
}

// New (exhaustive match)
match stmt {
    match_declaration!(Statement) => visitor.visit(stmt.to_declaration()),
}

// New (alternative)
match stmt {
    _ if stmt.is_declaration() => visitor.visit(stmt.to_declaration()),
}
```

New syntax has pluses and minuses vs the old. `match` syntax is worse,
but when working with a deeply nested enum, the code is much nicer -
it's shorter and easier to read.

This PR removes 200 lines from the linter with changes like this:


https://github.com/oxc-project/oxc/pull/3115/files#diff-dc417ff57352da6727a760ec6dee22de6816f8231fb69dbef1bf05d478699103L92-R95

```diff
- let AssignmentTarget::SimpleAssignmentTarget(simple_assignment_target) =
-     &assignment_expr.left
- else {
-     return;
- };
- let SimpleAssignmentTarget::AssignmentTargetIdentifier(ident) =
-     simple_assignment_target
+ let AssignmentTarget::AssignmentTargetIdentifier(ident) = &assignment_expr.left
else {
    return;
};
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ast Area - AST A-codegen Area - Code Generation A-linter Area - Linter A-minifier Area - Minifier A-parser Area - Parser A-prettier Area - Prettier A-semantic Area - Semantic A-transformer Area - Transformer / Transpiler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants