Some last work

carbon-language · Apr 1, 2024 · 581b482 · 581b482
1 parent 8dc691b
commit 581b482
Showing 1 changed file with 72 additions and 24 deletions.
diff --git a/proposals/p3797.md b/proposals/p3797.md
@@ -35,6 +35,9 @@ example, `base`). This is being called "raw identifier syntax" using
 `r#<identifier>`, and is based on
 [Rust](https://doc.rust-lang.org/reference/identifiers.html).
 
+Note this proposal is derived from
+[Proposal #17: Lexical conventions](https://github.com/carbon-language/carbon-lang/pull/17).
+
 ## Problem
 
 One of Carbon's most important goals is to support program and language
@@ -121,27 +124,34 @@ use the non-prefixed identifier name for consistency.
 
 ### Other raw identifier syntaxes
 
-Advantages to `r#` are:
+For considering other syntaxes, a couple initial considerations for
+`r#identifier` prefixing is:
 
 - We use `#` prefixes for
  [string literals](/docs/design/lexical_conventions/string_literals.md), and
- it's likely we'll support syntax similar to `f#` for formatted string
- literals. The `r#` syntax offers consistency with this, and will hopefully
- be recognizable to users.
+ it's likely we'll support syntax similar to `f#"..."` for interpolated
+ string literals. The `r#` syntax offers consistency with this, and will
+ hopefully be recognizable to users.
 - Consistency with Rust.
- - Rust uses `r#` for raw string literals, whereas Carbon uses `#`.
-
-A disadvantage is that any `r`-prefixed identifier parses substantially slower,
-as noted by the benchmarks in
-[PR #3044](https://github.com/carbon-language/carbon-lang/pull/3344) which
-implemented `r#` syntax. A 2% benchmark slowdown indicates around 2x because `r`
-is about 1-in-55 identifiers. This may be reduced if we enable tail calls and
-other optimizations.
+ - Rust uses `r#"..."` for raw string literals, whereas Carbon uses
+ `#"..."`.
+- Introduces another code execution path in lexing identifiers. This likely
+ causes a slowdown;
+ [PR #3044](https://github.com/carbon-language/carbon-lang/pull/3344)
+ indicates roughly 2%, although that was run on a system with noisy
+ benchmarks -- details would require a better system for benchmark. Note 2%
+ could represent that `r` is 1-in-55 identifiers with a 100% slowdown with
+ linear cost scaling for other similar code, or it could indicate that the
+ additional code path causes incremental slowdown but if other code (such as
+ `f#"..."`) used the same codepath it may instead have constant cost scaling
+ (negligible incremental cost). This may also be either reduced or become
+ more significant if we enable tail calls and other optimizations. As a
+ consequence, the precise overhead is difficult to quantify at this time.
 
 Various other prefixes have been discussed, mostly using a special character
 prefix in order to restrict the lexing impact. In particular:
 
-- `\` prefix.
+- `\` prefix, as in `\identifier`.
  - Similar to `\` escaping in strings.
  - More intuitive "escaping" semantic for some developers versus `r#`.
  - Creates a different meaning for `\n` as an identifier versus `\n` as a
@@ -151,19 +161,30 @@ prefix in order to restrict the lexing impact. In particular:
  character escape. The alternative
  [Restrict raw identifier syntax to current and future keywords](#restrict-raw-identifier-syntax-to-current-and-future-keywords)
  applies to this solution.
-- `#` prefix without `r`.
+- `#` prefix without `r`, as in `#identifier`.
  - Would be more consistent with string literals, and avoid the lexing
  overhead.
  - We are considering using a `#` prefix for metaprogramming, so the `r`
  offers a way to keep the `#` prefix available for other purposes.
  - `#if` may look to C++ developers like a compiler directive, rather than
  a raw identifier for `if`.
-- Backticks, consistent with Swift.
+- `@` prefix, as in `@identifier`.
+ - Consistent with C#.
+ - We've also discussed using a `@` prefix for attributes, similar to
+ Python. Similar to `#`, this would be conflicting.
+- <code>\`</code> wrapping, as in <code>\`identifier\`</code>.
+
+ - Consistent with Swift.
  - We prefer not to use backticks for Carbon syntax so that it is easy to
- write in Markdown, which uses backticks for inline code.
-- `@` prefix, consistent with C#.
- - We've also discussed using `@` for attributes, similar to Python.
-- Other currently unused characters, such as `~`, `$`, or `%`.
+ write in Markdown, which uses backticks for inline code. For example,
+ the label for this alternative requires inline html:
+
+ ```
+ <code>\`</code>
+ ```
+
+- Other currently unused characters as prefix, such as `~identifier`,
+ `$identifier`, or `%identifier`.
  - We expect raw identifiers to be relatively rare. There may be future
  uses for these characters that allow us to serve a broader use-case.
  - While we could change raw string literal syntax to use the same
@@ -178,6 +199,9 @@ of Carbon, or from other languages. This means it's helpful if the syntax can be
 understood on its own, but if it's confusable with C++ syntax, the relative
 rarity could exacerbate understandability issues.
 
+If performance of the `r#` prefix is prohibitive, that would be a justification
+for changing approaches.
+
 ### Restrict raw identifier syntax to current and future keywords
 
 We had discussed maintaining a list of current and future keywords, and only
@@ -199,19 +223,43 @@ to update their dependencies as well.
 
 We could say that, in a scope where a raw identifier has been declared, the
 token without `r#` now refers to the identifier instead of the keyword. If the
-user actually needs the keyword, they could instead use `k#` or something
-similar.
+user actually needs the keyword within that scope, they could instead use `k#`
+or something similar.
 
-A particularly complex example of this can be seen with the `base` keyword:
+A particular example of this can be seen with the `base` keyword:
+
+```
+class C {
+ // `base` now means this name in the scope of `C`.
+ var r#base: i32;
+ // To extend, `k#base` is now required.
+ extend k#base: T;
+}
+
+fn MakeC() -> C {
+ // The struct literal's `base` is outside the scope of `C`, so must use
+ // `r#base`.
+ var c: C = {.r#base = 0, .base = { ... }};
+ // A member reference could use the identifier-default for `base` in `C`.
+ c.base = 1;
+ c.k#base = {...};
+ return c;
+}
+```
+
+The equivalent under proposed syntax (uniformly using `r#base`) is:
 
 ```
 class C {
- extend base: T;
  var r#base: i32;
+ extend base: T;
 }
 
 fn MakeC() -> C {
- return {.base = 0, .k#base = { ... }};
+ var c: C = {.r#base = 0, .base = { ... }};
+ c.r#base = 1;
+ c.base = {...};
+ return c;
 }
 ```