Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

caching common tokens #821

Open
OmarTawfik opened this issue Feb 15, 2024 · 1 comment
Open

caching common tokens #821

OmarTawfik opened this issue Feb 15, 2024 · 1 comment
Assignees

Comments

@OmarTawfik
Copy link
Collaborator

Our scanner copies the string contents of every token when creating it. But many tokens share the same source string:

  • Whitespaces.
  • Newlines.
  • Common keywords like public, contract, function.

We should calculate the most frequent tokens (using sanctuary datasets), and define them as static/const values in source, returning static references when possible, to eliminate allocations:

  • We can try using phf::Set::get_key in Token::new() to fetch the static refs and store them instead of cloning the original input, similar to smol_str::SmolStr::new_static.
  • We can also try making the whole Token::new() as const, storing the entire object instead of just the string in the phf map.
  • We can also try caching the Rc<Token>, so that all of them share the same instance/pointer.

Once we have #808, we can evaluate the impact of these optimizations.

@OmarTawfik OmarTawfik self-assigned this Feb 15, 2024
@AntonyBlakey
Copy link
Contributor

I think on-demand interning will be enough, without pre-computation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ⏳ Todo
Development

No branches or pull requests

2 participants