Skip to content

Commit

Permalink
Some last work
Browse files Browse the repository at this point in the history
  • Loading branch information
jonmeow committed Apr 1, 2024
1 parent 8dc691b commit 581b482
Showing 1 changed file with 72 additions and 24 deletions.
96 changes: 72 additions & 24 deletions proposals/p3797.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@ example, `base`). This is being called "raw identifier syntax" using
`r#<identifier>`, and is based on
[Rust](https://doc.rust-lang.org/reference/identifiers.html).

Note this proposal is derived from
[Proposal #17: Lexical conventions](https://github.com/carbon-language/carbon-lang/pull/17).

## Problem

One of Carbon's most important goals is to support program and language
Expand Down Expand Up @@ -121,27 +124,34 @@ use the non-prefixed identifier name for consistency.

### Other raw identifier syntaxes

Advantages to `r#` are:
For considering other syntaxes, a couple initial considerations for
`r#identifier` prefixing is:

- We use `#` prefixes for
[string literals](/docs/design/lexical_conventions/string_literals.md), and
it's likely we'll support syntax similar to `f#` for formatted string
literals. The `r#` syntax offers consistency with this, and will hopefully
be recognizable to users.
it's likely we'll support syntax similar to `f#"..."` for interpolated
string literals. The `r#` syntax offers consistency with this, and will
hopefully be recognizable to users.
- Consistency with Rust.
- Rust uses `r#` for raw string literals, whereas Carbon uses `#`.

A disadvantage is that any `r`-prefixed identifier parses substantially slower,
as noted by the benchmarks in
[PR #3044](https://github.com/carbon-language/carbon-lang/pull/3344) which
implemented `r#` syntax. A 2% benchmark slowdown indicates around 2x because `r`
is about 1-in-55 identifiers. This may be reduced if we enable tail calls and
other optimizations.
- Rust uses `r#"..."` for raw string literals, whereas Carbon uses
`#"..."`.
- Introduces another code execution path in lexing identifiers. This likely
causes a slowdown;
[PR #3044](https://github.com/carbon-language/carbon-lang/pull/3344)
indicates roughly 2%, although that was run on a system with noisy
benchmarks -- details would require a better system for benchmark. Note 2%
could represent that `r` is 1-in-55 identifiers with a 100% slowdown with
linear cost scaling for other similar code, or it could indicate that the
additional code path causes incremental slowdown but if other code (such as
`f#"..."`) used the same codepath it may instead have constant cost scaling
(negligible incremental cost). This may also be either reduced or become
more significant if we enable tail calls and other optimizations. As a
consequence, the precise overhead is difficult to quantify at this time.

Various other prefixes have been discussed, mostly using a special character
prefix in order to restrict the lexing impact. In particular:

- `\` prefix.
- `\` prefix, as in `\identifier`.
- Similar to `\` escaping in strings.
- More intuitive "escaping" semantic for some developers versus `r#`.
- Creates a different meaning for `\n` as an identifier versus `\n` as a
Expand All @@ -151,19 +161,30 @@ prefix in order to restrict the lexing impact. In particular:
character escape. The alternative
[Restrict raw identifier syntax to current and future keywords](#restrict-raw-identifier-syntax-to-current-and-future-keywords)
applies to this solution.
- `#` prefix without `r`.
- `#` prefix without `r`, as in `#identifier`.
- Would be more consistent with string literals, and avoid the lexing
overhead.
- We are considering using a `#` prefix for metaprogramming, so the `r`
offers a way to keep the `#` prefix available for other purposes.
- `#if` may look to C++ developers like a compiler directive, rather than
a raw identifier for `if`.
- Backticks, consistent with Swift.
- `@` prefix, as in `@identifier`.
- Consistent with C#.
- We've also discussed using a `@` prefix for attributes, similar to
Python. Similar to `#`, this would be conflicting.
- <code>\`</code> wrapping, as in <code>\`identifier\`</code>.

- Consistent with Swift.
- We prefer not to use backticks for Carbon syntax so that it is easy to
write in Markdown, which uses backticks for inline code.
- `@` prefix, consistent with C#.
- We've also discussed using `@` for attributes, similar to Python.
- Other currently unused characters, such as `~`, `$`, or `%`.
write in Markdown, which uses backticks for inline code. For example,
the label for this alternative requires inline html:

```
<code>\`</code>
```

- Other currently unused characters as prefix, such as `~identifier`,
`$identifier`, or `%identifier`.
- We expect raw identifiers to be relatively rare. There may be future
uses for these characters that allow us to serve a broader use-case.
- While we could change raw string literal syntax to use the same
Expand All @@ -178,6 +199,9 @@ of Carbon, or from other languages. This means it's helpful if the syntax can be
understood on its own, but if it's confusable with C++ syntax, the relative
rarity could exacerbate understandability issues.

If performance of the `r#` prefix is prohibitive, that would be a justification
for changing approaches.

### Restrict raw identifier syntax to current and future keywords

We had discussed maintaining a list of current and future keywords, and only
Expand All @@ -199,19 +223,43 @@ to update their dependencies as well.

We could say that, in a scope where a raw identifier has been declared, the
token without `r#` now refers to the identifier instead of the keyword. If the
user actually needs the keyword, they could instead use `k#` or something
similar.
user actually needs the keyword within that scope, they could instead use `k#`
or something similar.

A particularly complex example of this can be seen with the `base` keyword:
A particular example of this can be seen with the `base` keyword:

```
class C {
// `base` now means this name in the scope of `C`.
var r#base: i32;
// To extend, `k#base` is now required.
extend k#base: T;
}
fn MakeC() -> C {
// The struct literal's `base` is outside the scope of `C`, so must use
// `r#base`.
var c: C = {.r#base = 0, .base = { ... }};
// A member reference could use the identifier-default for `base` in `C`.
c.base = 1;
c.k#base = {...};
return c;
}
```

The equivalent under proposed syntax (uniformly using `r#base`) is:

```
class C {
extend base: T;
var r#base: i32;
extend base: T;
}
fn MakeC() -> C {
return {.base = 0, .k#base = { ... }};
var c: C = {.r#base = 0, .base = { ... }};
c.r#base = 1;
c.base = {...};
return c;
}
```

Expand Down

0 comments on commit 581b482

Please sign in to comment.