Grammars with complex identifiers #803
Answered
by
GordianDziwis
GordianDziwis
asked this question in
Q&A
-
Currently I am writing a grammar with sparql. Sparql differs from programming languages, that the rules for a single valid identifier token are quite complex. For example the rules for a variable name are: var: $ => seq(
choice(
'?',
'$'
),
$._var_name
),
_pn_chars_base: $ => token(choice(
/[A-Z]/,
/[a-z]/,
// /[\u00C0-\u00D6]/,
// /[\u00D8-\u00F6]/,
// /[\u00F8-\u02FF]/,
// /[\u0370-\u037D]/,
// /[\u037F-\u1FFF]/,
// /[\u200C-\u200D]/,
// /[\u2070-\u218F]/,
// /[\u2C00-\u2FEF]/,
// /[\u3001-\uD7FF]/,
// /[\uF900-\uFDCF]/,
// /[\uFDF0-\uFFFD]/,
// /[\u{10000}-\u{EFFFF}]/u
)),
// [165]
_pn_chars_u: $ => choice(
$._pn_chars_base,
'_'
),
// [166]
_var_name: $ => prec.right(seq(
choice(
$._pn_chars_u,
/[0-9]/
),
repeat(choice(
$._pn_chars_u,
/[0-9]/,
/[\u00B7]/,
/[\u0300-\u036F]/,
/[\u203F-\u2040]/
))
)), This is a simple case, this it would be a unmaintainable mess to express the rule for identifiers as a combination of regexes and the How can I handle this? |
Beta Was this translation helpful? Give feedback.
Answered by
GordianDziwis
Nov 11, 2020
Replies: 1 comment
-
Ok, defining all those ranges as arrays seems to work: const PN_CHARS_BASE = [
/[A-Z]/,
/[a-z]/,
// /[\u00C0-\u00D6]/,
// /[\u00D8-\u00F6]/,
// /[\u00F8-\u02FF]/,
// /[\u0370-\u037D]/,
// /[\u037F-\u1FFF]/,
// /[\u200C-\u200D]/,
// /[\u2070-\u218F]/,
// /[\u2C00-\u2FEF]/,
// /[\u3001-\uD7FF]/,
// /[\uF900-\uFDCF]/,
// /[\uFDF0-\uFFFD]/,
// /[\u{10000}-\u{EFFFF}]/u
]
_pn_chars_u: $ => choice(
...PN_CHARS_BASE,
'_'
), |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
GordianDziwis
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Ok, defining all those ranges as arrays seems to work: