caching common tokens #821

OmarTawfik · 2024-02-15T12:31:20Z

Our scanner copies the string contents of every token when creating it. But many tokens share the same source string:

Whitespaces.
Newlines.
Common keywords like public, contract, function.

We should calculate the most frequent tokens (using sanctuary datasets), and define them as static/const values in source, returning static references when possible, to eliminate allocations:

We can try using phf::Set::get_key in Token::new() to fetch the static refs and store them instead of cloning the original input, similar to smol_str::SmolStr::new_static.
We can also try making the whole Token::new() as const, storing the entire object instead of just the string in the phf map.
We can also try caching the Rc<Token>, so that all of them share the same instance/pointer.

Once we have #808, we can evaluate the impact of these optimizations.

The text was updated successfully, but these errors were encountered:

AntonyBlakey · 2024-02-15T15:48:35Z

I think on-demand interning will be enough, without pre-computation

OmarTawfik self-assigned this Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

caching common tokens #821

caching common tokens #821

OmarTawfik commented Feb 15, 2024

AntonyBlakey commented Feb 15, 2024

caching common tokens #821

caching common tokens #821

Comments

OmarTawfik commented Feb 15, 2024

AntonyBlakey commented Feb 15, 2024