Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grammar railroad diagram #46

Open
mingodad opened this issue Jan 1, 2022 · 30 comments
Open

Grammar railroad diagram #46

mingodad opened this issue Jan 1, 2022 · 30 comments

Comments

@mingodad
Copy link
Contributor

mingodad commented Jan 1, 2022

Going through the code on https://github.com/titzer/virgil/blob/master/aeneas/src/vst/Parser.v3 and using this tool https://github.com/mingodad/CocoR-CSharp to create a LL(1) parser to then generate an EBNF understood by https://www.bottlecaps.de/rr/ui to generate a railroad diagram (https://en.wikipedia.org/wiki/Syntax_diagram) I've got an initial version that already shows a big chunk of virgil grammar.

Copy the EBNF shown bellow on https://www.bottlecaps.de/rr/ui in the tab Edit Grammar then switch to the tab View Diagram, I think that it's useful for documentation and understand/develop the syntax of virgil:

//
// EBNF generated by CocoR parser generator to be viewed with https://www.bottlecaps.de/rr/ui
//

//
// productions
//

Virgil ::=  parseToplevelDecl* EOF
parseToplevelDecl ::=  TK_class parseIdentCommon ( "(" ( parseClassParam ( "," parseClassParam )* )? ")" )? ( TK_extends parseTypeRef )? parseTupleExpr? parseMembers | TK_component parseIdentVoid parseMembers | TK_import TK_component parseIdentVoid parseMembers | parseVar | parseDef | parseVariant | parseEnum | parseExport
parseIdentCommon ::=  identParam parseTypeRef ( "," parseTypeRef )* ">" | ident
parseClassParam ::=  TK_var? parseParamWithOptType
parseTypeRef ::=  ( "(" ( parseTypeRef ( "," parseTypeRef )* )? ")" | parseIdentCommon ( "." parseIdentCommon )* ) ( "->" parseTypeRef )*
parseTupleExpr ::=  "(" ( parseExpr ( "," parseExpr )* )? ")"
parseMembers ::=  "{" parseMember* TK_rbrace
parseIdentVoid ::=  ident
parseVar ::=  TK_var parseIdentVoid parseFieldSuffix
parseDef ::=  TK_def TK_var? ( parseIndexed | parseIdentCommon ( parseMethodSuffix | parseFieldSuffix ) )
parseVariant ::=  TK_type parseIdentCommon ( "(" ( parseVariantCaseParam ( "," parseVariantCaseParam )* )? ")" )? parseVariantCases
parseEnum ::=  TK_enum parseIdentVoid ( "(" ( parseEnumParam ( "," parseEnumParam )* )? ")" )? "{" ( parseEnumCase ( "," parseEnumCase )* )? TK_rbrace
parseExport ::=  TK_export ( parseDef | ( parseStringLiteral | parseIdent ) ( "=" parseIdent parseDottedVarExpr? )? ";" )
parseVariantCaseParam ::=  parseParamWithOptType
parseVariantCases ::=  "{" parseVariantCase* TK_rbrace
parseVariantCase ::=  parseDef | TK_case parseIdentVoid ( "(" ( parseVariantCaseParam ( "," parseVariantCaseParam )* )? ")" )? ( ";" | parseMembers )
parseStringLiteral ::=  string
parseIdent ::=  ident
parseDottedVarExpr ::=  "." parseTypeRef ( "." parseTypeRef )*
parseEnumParam ::=  parseParamWithOptType
parseEnumCase ::=  parseIdentVoid ( "(" ( parseExpr ( "," parseExpr )* )? ")" )?
parseExpr ::=  parseSubExpr ( "=" parseExpr | addBinOpSuffixes )?
parseMember ::=  TK_private? ( parseDef | parseNew | parseVar )
parseNew ::=  TK_new "(" ( parseNewParam ( "," parseNewParam )* )? ")" ( ":"? TK_super parseTupleExpr )? parseBlockStmt
parseNewParam ::=  TK_var? parseParamWithOptType
parseBlockStmt ::=  "{" parseStmt* TK_rbrace
parseTypeParam ::=  parseIdentVoid
parseStmt ::=  parseBlockStmt | parseEmptyStmt | parseIfStmt | parseWhileStmt | parseMatchStmt | parseVarStmt | parseDefStmt | parseBreakStmt | parseContinueStmt | parseReturnStmt | parseForStmt | parseExprStmt
parseEmptyStmt ::=  ";"
parseIfStmt ::=  TK_if parseControlExpr parseStmt ( TK_else parseStmt )?
parseWhileStmt ::=  TK_while parseControlExpr parseStmt
parseMatchStmt ::=  TK_match parseControlExpr "{" ( parseMatchCase parseMatchCase* )? TK_rbrace ( TK_else parseStmt )?
parseVarStmt ::=  TK_var parseIdentVoid parseVars*
parseDefStmt ::=  TK_def parseIdentVoid parseVars*
parseBreakStmt ::=  TK_break ";"
parseContinueStmt ::=  TK_continue ";"
parseReturnStmt ::=  TK_return parseExpr? ";"
parseForStmt ::=  TK_for "(" parseLocal ( "<" parseExpr | TK_in parseExpr | ";" parseExpr ";" parseExpr ) ")" parseStmt
parseExprStmt ::=  parseExpr ";"
parseControlExpr ::=  "(" parseExpr ")"
parseLocal ::=  parseIdentVoid ( ":" parseTypeRef )? ( "=" parseExpr )?
parseMatchCase ::=  "_" "=>" parseStmt | matchPattern matchPattern*
matchPattern ::=  parseMatchPattern ( "," parseMatchPattern )* "=>" parseStmt
parseMatchPattern ::=  parseIdMatchPattern | parseByteLiteral | "-"? parseNumber
parseIdMatchPattern ::=  ( TK_true | TK_false | TK_null ) | parseIdentCommon ( ":" parseTypeRef | parseDottedVarExpr? ( "(" ( parseMatchParam ( "," parseMatchParam )* )? ")" )? )
parseByteLiteral ::=  charcon
parseNumber ::=  bincon | floatcon | intcon
parseMatchParam ::=  parseIdentVoid
parseVars ::=  ( ":" parseTypeRef )? ( "=" parseExpr )? ( "," parseIdentVoid parseVars? | ";" )
parseFieldSuffix ::=  parseVars
parseIndexed ::=  "[" ( parseMethodParam ( "," parseMethodParam )* )? "]" ( "=" parseMethodParam | "->" parseTypeRef ) ( ";" | parseBlockStmt )
parseMethodSuffix ::=  "(" ( parseMethodParam ( "," parseMethodParam )* )? ")" ( "->" ( TK_this | parseTypeRef ) )? ( ";" | parseBlockStmt )
parseMethodParam ::=  TK_var? parseParamWithOptType
parseParamWithOptType ::=  parseIdentCommon ( ":" parseTypeRef )?
parseSubExpr ::=  parseTerm ( termMultSuffix termMultSuffix* incOrDec? | incOrDec )?
addBinOpSuffixes ::=  parseInfix parseSubExpr ( parseInfix parseSubExpr )*
parseTerm ::=  TK_if "(" parseExpr "," parseExpr ( "," parseExpr )? ")" | TK_true | TK_false | TK_null | "-"? ( parseNumber | parseTupleExpr | parseIdentCommon ) | ( "!" | "~" ) parseSubExpr | parseByteLiteral | parseStringLiteral | parseArrayLiteral | parseParamExpr | incOrDec parseSubExpr
termMultSuffix ::=  addMemberSuffix | parseTupleExpr | parseArrayLiteral
incOrDec ::=  "++" | "--"
addMemberSuffix ::=  "." ( parseIdentUnchecked | ( "!" | "?" ) ( "<" parseTypeRef ( "," parseTypeRef )* ">" )? | parseInfix | intcon | "~" | "[" "]" "="? )
parseArrayLiteral ::=  "[" ( parseExpr ( "," parseExpr )* )? "]"
parseIdentUnchecked ::=  parseIdentCommon
parseInfix ::=  "==" | "!=" | "||" | "&&" | "<" | "<=" | ">" | ">=" | ( "|" | "&" | "<<" | "<<<" | TK_shr | ">>>" | "+" | "-" | "*" | "/" | "%" | "^" ) "="?
parseParamExpr ::=  "_"

//
// tokens
//

TK_break ::= "break"
TK_case ::= "case"
TK_class ::= "class"
TK_component ::= "component"
TK_continue ::= "continue"
TK_def ::= "def"
TK_else ::= "else"
TK_enum ::= "enum"
TK_export ::= "export"
TK_extends ::= "extends"
TK_false ::= "false"
TK_for ::= "for"
TK_if ::= "if"
TK_import ::= "import"
TK_in ::= "in"
TK_layout ::= "layout"
TK_match ::= "match"
TK_new ::= "new"
TK_null ::= "null"
TK_private ::= "private"
TK_return ::= "return"
TK_struct ::= "struct"
TK_super ::= "super"
TK_this ::= "this"
TK_true ::= "true"
TK_type ::= "type"
TK_var ::= "var"
TK_while ::= "while"
TK_shr ::= ">>"
TK_rbrace ::= "}"

Here is the LL(1) parser that still need fixes to parse all the .v3 files of this project:

#include "Scanner-virgil.nut"

COMPILER Virgil
	int scanStateDepth = 0;

TERMINALS
	T_SYMBOL

CHARACTERS
	letter    = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".
	oct        = '0'..'7'.
	digit     = "0123456789".
	bindigit     = "01".
	bindigitwsep     = "01_".
	nzdigit    = '1'..'9'.
	digitwsep     = "0123456789_".
	cr        = '\r'.
	lf        = '\n'.
	tab       = '\t'.
	stringCh  = ANY - '"' - '\\' - cr - lf.
	charCh    = ANY - '\'' - '\\' - cr - lf.
	printable = '\u0020' .. '\u007e'.
	hex       = "0123456789abcdefABCDEF".
	hexwsep       = "0123456789abcdefABCDEF_".

	newLine   = cr + lf.
	notNewLine = ANY - newLine .
	ws         = " " + tab + '\u000b' + '\u000c'.

TOKENS
	ident     = letter { letter | digit | '_'}.
	identParam  = letter { letter | digit | '_'} '<'.
	floatcon =
		( digit {digitwsep} '.' digit {digitwsep} [('e'|'E')  ['+'|'-'] digit {digit}]
		| digit {digitwsep} ('e'|'E')  ['+'|'-'] digit {digit}
		) ['f'|'F' | 'd' | 'D']
		| digit {digitwsep} ('f'|'F') .

	intcon   = ( digit {digitwsep}
		//| '0' {oct}
		| ("0x"|"0X") hex {hexwsep}
		) [('u'|'U') ['l'|'L'] | ('l'|'L') ['u'|'U'] | ('d' | 'D')] .

	bincon  = '0' ('b' | 'B') bindigit {bindigitwsep} ['u' | 'U'].

	string    = '"' { stringCh | '\\' printable } '"'.
	badString = '"' { stringCh | '\\' printable } (cr | lf).
	charcon      = '\'' ( charCh | '\\' printable { hex } ) '\''.

	TK_break = "break" .
	TK_case = "case" .
	TK_class = "class" .
	TK_component = "component" .
	TK_continue = "continue" .
	TK_def = "def" .
	TK_else = "else" .
	TK_enum = "enum" .
	TK_export = "export" .
	TK_extends = "extends" .
	TK_false = "false" .
	TK_for = "for" .
	TK_if = "if" .
	TK_import = "import" .
	TK_in = "in" .
	TK_layout = "layout" .
	TK_match = "match" .
	TK_new : ident = "new" .
	TK_null = "null" .
	TK_private = "private" .
	TK_return = "return" .
	TK_struct = "struct" .
	TK_super = "super" .
	TK_this : ident = "this" .
	TK_true = "true" .
	TK_type = "type" .
	TK_var = "var" .
	TK_while = "while" .

	//types
	//TK_Array = "Array" .
	//TK_bool = "bool" .
	//TK_byte = "byte" .
	//TK_double = "double" .
	//TK_float = "float" .
	//TK_int = "int" .
	//TK_long = "long" .
	//TK_short = "short" .
	//TK_string = "string" .
	//TK_void = "void" .

	//Operators
	TK_shr = ">>" . //(. print("<<DAD>>"); .)

	//Puntuation
	TK_rbrace = '}' .

PRAGMAS

	COMMENTS FROM "/*" TO "*/" NESTED
	COMMENTS FROM "//" TO lf

IGNORE cr + lf + tab

/*-------------------------------------------------------------------------*/

PRODUCTIONS

Virgil =
	{parseToplevelDecl}
	EOF
	.

parseToplevelDecl =
	"class" parseIdentCommon ['(' [parseClassParam {',' parseClassParam}] ')'] ["extends" parseTypeRef] [parseTupleExpr] parseMembers
	| "component" parseIdentVoid parseMembers
	| "import" "component" parseIdentVoid parseMembers
	| parseVar
	| parseDef
	| parseVariant
	| parseEnum
	| parseExport
	.

parseVariant =
	"type" parseIdentCommon ['(' [parseVariantCaseParam {',' parseVariantCaseParam}] ')'] parseVariantCases
	.

parseVariantCases =
	'{' {parseVariantCase}'}'
	.

parseVariantCase =
	parseDef
	| "case" parseIdentVoid ['(' [parseVariantCaseParam {',' parseVariantCaseParam}] ')'] (';' | parseMembers)
	.

parseExport =
	"export" (parseDef | (parseStringLiteral | parseIdent) ['=' parseIdent [parseDottedVarExpr]] ';')
	.

parseEnum =
	"enum" parseIdentVoid ['(' [parseEnumParam {',' parseEnumParam}] ')']
		'{' [ parseEnumCase {','
			(. if(la.kind == ParserTokens._TK_rbrace) {break; /*allow trailing separator*/} .)
			parseEnumCase} ] '}'
	.

parseEnumCase =
	parseIdentVoid ['(' [parseExpr {',' parseExpr}] /*[',']*/ ')']
	.

parseMembers =
	'{' {parseMember} '}'
	.

parseMember =
	["private"] (parseDef | parseNew | parseVar)
	.

parseNew =
	"new" '(' [parseNewParam {',' parseNewParam}] ')' [([':'] "super") parseTupleExpr] parseBlockStmt
	.
/*
parseParamCommon =
	parseIdentCommon [parseIdentVoid] [':' parseTypeRef]
	.
*/
parseTypeRef =
	(
		'(' [parseTypeRef {',' parseTypeRef}] ')'
		| parseIdentCommon {'.' parseIdentCommon}
	)
	{"->" parseTypeRef}
	.

parseTypeParam =
	parseIdentVoid
	.

parseStmt =
	parseBlockStmt
	| parseEmptyStmt
	| parseIfStmt
	| parseWhileStmt
	| parseMatchStmt
	| parseVarStmt
	| parseDefStmt
	| parseBreakStmt
	| parseContinueStmt
	| parseReturnStmt
	| parseForStmt
	| parseExprStmt
	.

parseBlockStmt =
	'{' {parseStmt} '}'
	.

parseEmptyStmt =
	';'
	.

parseControlExpr =
	'(' parseExpr ')'
	.

parseIfStmt =
	"if" parseControlExpr parseStmt ["else" parseStmt]
	.

parseWhileStmt =
	"while" parseControlExpr parseStmt
	.

parseForStmt =
	"for" '(' parseLocal ('<' parseExpr | "in" parseExpr | ';' parseExpr ';' parseExpr) ')' parseStmt
	.

parseMatchStmt =
	"match" parseControlExpr '{' [parseMatchCase {parseMatchCase}] '}' ["else" parseStmt]
	.

parseMatchCase =
	'_' "=>" parseStmt
	| matchPattern {matchPattern}
	.

matchPattern =
	parseMatchPattern {',' parseMatchPattern} "=>" parseStmt
	.

parseMatchPattern =
	parseIdMatchPattern
	| parseByteLiteral
	| ['-'] parseNumber
	.

parseDottedVarExpr =
	'.' parseTypeRef {'.' parseTypeRef}
	.

parseIdMatchPattern =
	("true" | "false" | "null")
	| parseIdentCommon (
		':' parseTypeRef
		| [parseDottedVarExpr] ['(' [parseMatchParam {',' parseMatchParam}] ')']
	)
	.

parseMatchParam =
	parseIdentVoid
	.

parseVarStmt =
	"var" parseIdentVoid {parseVars}
	.

parseDefStmt =
	"def" parseIdentVoid {parseVars}
	.

parseBreakStmt =
	"break" ';'
	.

parseContinueStmt =
	"continue" ';'
	.

parseReturnStmt =
	"return" [parseExpr] ';'
	.

parseVar =
	"var" parseIdentVoid parseFieldSuffix
	.

parseDef =
	"def" ["var"] (parseIndexed | parseIdentCommon (parseMethodSuffix | parseFieldSuffix))
	.

parseIndexed =
	'[' [parseMethodParam {',' parseMethodParam}] ']' ('=' parseMethodParam | "->" parseTypeRef) (';' | parseBlockStmt)
	.

parseMethodSuffix =
	'(' [parseMethodParam {',' parseMethodParam}] ')' ["->" ("this" | parseTypeRef)] (';' | parseBlockStmt)
	.

parseParamWithOptType =
	parseIdentCommon [':' parseTypeRef]
	.

parseMethodParam =
	["var"] parseParamWithOptType
	.

parseNewParam =
	["var"] parseParamWithOptType
	.

parseClassParam =
	["var"] parseParamWithOptType
	.

parseEnumParam =
	parseParamWithOptType
	.

parseVariantCaseParam =
	parseParamWithOptType
	.

parseExprStmt =
	parseExpr ';'
	.

parseExpr =
	parseSubExpr ['=' parseExpr | addBinOpSuffixes]
	.

parseSubExpr =
	parseTerm [termMultSuffix {termMultSuffix} [incOrDec] | incOrDec]
	.

incOrDec =
	"++" | "--"
	.

termMultSuffix =
	addMemberSuffix | parseTupleExpr | parseArrayLiteral
	.

addMemberSuffix = (. scanner.stateNo=6; .) //for tuple indexing by integers
	'.' (
		parseIdentUnchecked
		| ('!' | '?') ['<' parseTypeRef {',' parseTypeRef} '>']
		| parseInfix
		| intcon
		| '~'
		| '[' ']' ['=']
	) (. scanner.stateNo=0; .)
	.

parseTerm =
	"if" '(' parseExpr ',' parseExpr [',' parseExpr]')'
	//| parseVarExpr
	| "true"
	| "false"
	| "null"
	| ['-'] (parseNumber | parseTupleExpr | parseIdentCommon)
	| ('!' | '~') parseSubExpr
	| parseByteLiteral
	| parseStringLiteral
	| parseArrayLiteral
	| parseParamExpr
	| incOrDec parseSubExpr
	.

parseParamExpr =
	'_'
	.

parseByteLiteral =
	charcon
	.

parseStringLiteral =
	string
	.

parseTupleExpr =
	'(' [parseExpr {',' parseExpr}] ')'
	.

parseArrayLiteral =
	'[' [parseExpr {',' parseExpr}] ']'
	.

parseNumber =
	bincon //BinLiteral
	//| HexLiteral
	| floatcon //FloatLiteral
	| intcon //DecLiteral
	.
/*
parseVarExpr =
	parseIdentCommon
	| "true"
	| "false"
	| "null"
	.
*/
parseIdent =
	ident
	.

parseIdentVoid =
	ident
	.

parseIdentCommon =
	identParam
		(. if(scanStateDepth++ == 0) scanner.stateNo = 5; .)
		parseTypeRef {',' parseTypeRef} '>'
		(. if(--scanStateDepth == 0) scanner.stateNo = 0; .)
	| ident
	.

parseIdentUnchecked =
	parseIdentCommon
	.

parseLocal =
	parseIdentVoid [':' parseTypeRef] ['=' parseExpr]
	.

parseVars =
	[':' parseTypeRef] ['=' parseExpr] (',' parseIdentVoid [parseVars] | ';')
	.

parseFieldSuffix =
	parseVars
	.

addBinOpSuffixes =
	parseInfix parseSubExpr {parseInfix parseSubExpr}
	.

parseInfix =
	"=="
	| "!="
	| "||"
	| "&&"
	| '<'
	| "<="
	| '>'
	| ">="
	| (
		'|'
		| '&'
		| "<<"
		| "<<<"
		| ">>"
		| ">>>"
		| '+'
		| '-'
		| '*'
		| '/'
		| '%'
		| '^'
	) ['=']
	.

END Virgil.
@titzer
Copy link
Owner

titzer commented Jan 1, 2022

Neat! I just glanced through this, but when I get a chance to vet in detail, it could make a great addition to documentation. Thanks!

@titzer
Copy link
Owner

titzer commented Jan 1, 2022

A couple tweaks to tokens:

  • Identifiers can't start with underscore.
  • Float literals can't start with a dot.
  • String literals can include escapes, including the usual \r, \n, \b, \t, and \x hex codes.
  • The parser only natively supports \n as line end; I think cygwin does automatic conversion on windows. But probably is fine to document it as either.
  • There are precedences to infix operators that should probably be spelled out in the grammar. My recursive descent parser uses a tiny LR stack in the infix production which obscures this.
  • Octal literals are recognized by the parser, but rejected with a special (hopefully more helpful) error message, so probably best not to put them in the grammar.

Thanks a bunch for doing this, it's a great start! Note also there are a lot of tests in test/*/parser, both positive and negative tests.

@nuchi
Copy link

nuchi commented Jan 1, 2022

Arriving here via your HN post — I made a tool that can generate .sublime-syntax files from an EBNF for a language (with some mild restrictions, but if the grammar is LL(1) then that's definitely sufficient). It'd need some mild syntax adjustments to work with my program and also to add scopes, but having this EBNF in finished form would get you most of the way there.

https://github.com/nuchi/sublime-from-cfg

@mingodad
Copy link
Contributor Author

mingodad commented Jan 2, 2022

I just updated the EBNF posted on my first message with the latest work I did that can recognize a lot more *.v3 files from this project.

Observations about what I found doing this grammar:

  • The relational operator < need an space before it otherwise the parser will try recognize it as start of generics.
  • The start token < for generics is not allowed to have spaces after the identifier to parse correctly.
  • The way used to mark a number as unsigned using the u prefix is trick to disambiguate from identifiers, I would suggest/prefer to use it as suffix instead.

Would be nice to have a playground like https://fascinatedbox.gitlab.io/lily-docs//intro-sandbox.html that is a scripting language with a set of features like virgil, considering that https://github.com/titzer is an expert in webassembly and virgil is supposed to run/produce wasm wich would increase the attraction for more developers dive in and join this project.

@titzer
Copy link
Owner

titzer commented Jan 3, 2022

Hi @mingodad , yep, you're right about the '<' operator and generics. It was my "easy fix" to avoid backtracking in the parser. Which 'u' prefix are you referring to here? I don't recall accepting a 'u' prefix, either in escapes or in integer literals.

I agree that an in-browser demo would be cool. @diakopter had a web-based demo with Wasm, but I am not sure the status of it.

@mingodad
Copy link
Contributor Author

mingodad commented Jan 3, 2022

Hello titzer !
The reference of u prefix for unsigned was a mistake from me and my parser that was not managing unary minus correctly and the first file that show the problem have this var f: double = - u5.view(0); and I was mistakenly interpreting u5 and unsigned integer instead of a named variable.

I just updated again the EBNF and parser on the first message, now it recognizes almost all files in virgil/*.v3 and wizard-engine/*.v3 with the exception of the ones bellow (some of then was marked with //@seman = ParseError and in some of then there is enum elements not separated by commas and I'm not sure if that is intentional):

error: virgil/bench/DeltaBlue/old/DeltaBlue_vector.v3:169:5: "=>" expected
--==	5	169	4	0	<=>	46	169	5	:

error: virgil/test/enums/set03.v3:5:2: TK_rbrace expected
--==	2	4	62	Bf	<=>	2	5	2	C0

error: virgil/test/enums/seman/set65.v3:5:2: TK_rbrace expected
--==	2	4	62	Bf	<=>	2	5	2	C0

error: virgil/test/testexec/ExecuteTest.v3:251:8: "=>" expected
--==	9	251	4	'\''	<=>	46	251	8	:

error: virgil/test/core/parser/unary03.v3:2:14: invalid parseTerm
--==	52	2	13	-	<=>	20	2	14	false

error: virgil/test/core/parser/ifexpr06.v3:3:17: "," expected
--==	2	3	16	a	<=>	42	3	17	)

error: virgil/test/core/parser/export03.v3:2:8: invalid parseExport
--==	18	2	1	export	<=>	3	2	8	name<

error: virgil/test/core/parser/unary01.v3:4:18: invalid parseTerm
--==	52	4	17	-	<=>	20	4	18	false

error: virgil/test/core/parser/lit__02.v3:5:20: invalid parseVars
--==	6	5	14	0b1111	<=>	50	5	20	_

== virgil	test/core/parser/invalid05.v3

test/core/parser/invalid04.v3

error: virgil/test/core/parser/unary02.v3:2:14: invalid parseTerm
--==	52	2	13	-	<=>	20	2	14	false

error: virgil/test/core/parser/match_p00.v3:4:4: "=>" expected
--==	5	4	3	0	<=>	40	4	4	(

== virgil	test/core/seman/deep02.v3

error: virgil/test/core/seman/cast09.v3:2:21: invalid parseVars
--==	57	2	20	!	<=>	57	2	21	!

error: virgil/test/core/seman/unary01.v3:2:10: invalid parseTerm
--==	52	2	9	-	<=>	7	2	10	""

error: virgil/test/core/seman/ifex14.v3:3:28: ")" expected
--==	5	3	27	1	<=>	41	3	28	,

error: virgil/test/core/seman/ifex15.v3:3:22: "," expected
--==	34	3	18	true	<=>	42	3	22	)

error: virgil/test/core/seman/unary00.v3:2:10: invalid parseTerm
--==	52	2	9	-	<=>	20	2	10	false

error: virgil/test/lang/fsi/seman_fsi_promote09.v3:2:9: invalid parseVars
--==	2	2	8	i	<=>	77	2	9	$

error: virgil/rt/posix/X86_64LinuxLayouts.v3:7:1: EOF expected
--==	null	null	null		<=>	25	7	1	layout

error: virgil/wizard-engine/src/engine/Interpreter.v3:74:2: TK_rbrace expected
--==	2	73	2	TRAPPED	<=>	2	74	2	TRAPPING

error: virgil/wizard-engine/src/jawa/JawaOpcodes.v3:98:2: TK_rbrace expected
--==	2	97	2	INTERFACE_DECL	<=>	2	98	2	CLASS_DEF

@mingodad
Copy link
Contributor Author

mingodad commented Jan 3, 2022

I could not find how to rebuild/bootstrap virgil, there is no Makefile or shell script to do it or at least I could not find one.
Can you add documentation about it or/and add a Makefile or shell script to do so ?

@mingodad
Copy link
Contributor Author

mingodad commented Jan 3, 2022

I just found that you've added a start folder describing how to bootstrap virgil, I updated my clone and will try to play with it a bit, thanks !

@titzer
Copy link
Owner

titzer commented Jan 3, 2022

Yep, I added that because traffic spiked a little. I could add a Makefile just as a convenience.

For reference uNN and iNN for NN 1 to 64 are fixed-width integer types.

@mingodad
Copy link
Contributor Author

mingodad commented Jan 3, 2022

So I was not wrong on my initial interpretation of u5 after all, so my previous concern/suggestion to use suffix instead of prefix still apply.

And what about the layout / struct keywords that appear in virgil/rt/posix/X86_64LinuxLayouts.v3 there is no code to parsing then ?

@titzer
Copy link
Owner

titzer commented Jan 3, 2022

For the fixed-width integer types, the names will fall into the identifier category, since they start with a letter. I'm ok with that.

The struct and layout keywords are reserved for a future extension that I have been sketching for describing memory layouts. The file that you found is a conception of what that extension would look like to describe all the data structures that can be passed to/from the Linux kernel (on x86-64).

@mingodad
Copy link
Contributor Author

mingodad commented Jan 3, 2022

Thank you for reply !

Another topic that come while testing my parser was an error while trying to parse Array<Array<byte>> on bench/PrimeSieve/PrimeSieve.v3 so I tried to run it as standalone program then I was getting this error:

virgil run PrimeSieve.v3 
[PrimeSieve.v3 @ 7:44] UnresolvedIdentifier: identifier "Int" cannot be found
		if (args.length > 0) max = Int.parse(args[0]);
                                           ^^^
[PrimeSieve.v3 @ 7:48] UnresolvedMember: expression of type <null> has no such member "parse"
		if (args.length > 0) max = Int.parse(args[0]);
                                               ^^^^^

Then I did a search in all *.v3 files to find mentions to Int and found in bench/Common.v3:

// Utility code for all benchmarks.
component Int {
	// parse a string as an integer, without any error checking
	def parse(a: Array<byte>) -> int {
		var accum = 0;
		for (i < a.length) {
			var dig = int.!(a[i]);
			accum = accum * 10 + dig - int.!('0'); 
		}
		return accum;
	}
}

In aeneas/src/types/Int.v3 :

// Utility methods for working with ints, including parsing and rendering,
// as well as the representation of the "int" type in the compiler
component Int {
	def MAX_WIDTH = 64;
	private def cache = Array<IntType>.new(2 * MAX_WIDTH + 1);
...

In lib/util/Ints.v3:

// Utilities for rendering and parsing integers in various formats, as well
// as additional arithmetic routines.
component Ints {
	private def U32_0 = u32.view('0');
	// Parse a decimal integer value beginning at {a[pos]}. Returns a
	// pair of the status (# of characters read if successful, <= 0 if failure),
	// and the value.
	def parseDecimal(a: Array<byte>, pos: int) -> (/*status:*/int, /*value:*/int) {
		if (pos >= a.length) return (0, 0);
...

So I ended up copying an pasting to be able to run it:

// Copyright 2011 Google Inc. All rights reserved.
// See LICENSE for details of Apache 2.0 license.

// Utility code for all benchmarks.
component Int {
	// parse a string as an integer, without any error checking
	def parse(a: Array<byte>) -> int {
		var accum = 0;
		for (i < a.length) {
			var dig = int.!(a[i]);
			accum = accum * 10 + dig - int.!('0');
		}
		return accum;
	}
}

component PrimeSieve {
	def main(args: Array< Array< byte > >) -> int {
		var max = 1000;
		if (args.length > 0) max = Int.parse(args[0]);

		var array = Array<bool>.new(max);
		array[0] = true;
		array[1] = true;
		for (i = 2; i < max; i++) {
			if (!array[i]) {
				for (j = i * 2; j < max; j = j + i) array[j] = true;
			}
		}
		var count = 0;
		for (k < max) {
			if (!array[k]) count++;
		}
		System.puts("Number of primes less than ");
		System.puti(max);
		System.puts(": ");
		System.puti(count);
		System.ln();
		return 0;
	}
}

How is code reuse/dependencies done/managed in virgil ?

@mingodad
Copy link
Contributor Author

mingodad commented Jan 3, 2022

Another question about variables by value/reference, looking at lib/util/Token.v3 I see this declaration:

// A point in a file, including the file name, beginning line, and beginning column.
class FilePoint(fileName: string, beginLine: int, beginColumn: int) {
...

The question is: Every token has a copy of fileName or only a reference to it ?

@titzer
Copy link
Owner

titzer commented Jan 3, 2022

The fileName in the Token is a reference.

@titzer
Copy link
Owner

titzer commented Jan 3, 2022

Dependencies aren't managed by the compiler in Virgil. You need to pass all the files of your program to the compiler. The runtime and garbage collector code also needs to be included, but the target-specific scripts, e.g. v3c-x86-linux, do this for you.

Apologies, the bench/ directory might have rotted a bit. For the benchmarks, you need to include Common.v3 in the command line to compile. (generally run.bash would do this for you.)

But manually:

% cd bench
% v3c-x86-linux -output=/tmp/ Common.v3 Fannkuch/*.v3
% /tmp/Fannkuch
 . . .

E.g. to compile one of the applications in virgil/apps, it's typically:

% cd apps/vctags
% v3c-x86-linux *.v3 `cat DEPS`
% ./vctags
Usage: vctags [options] [Virgil file(s)]

  -h    Print this option summary.
  -e    Output tag file for use with Emacs.
% _

@mingodad
Copy link
Contributor Author

mingodad commented Jan 4, 2022

How to call inherited methods ?

I was trying to implement printHelp for component Aeneas:

def printHelp(args: Array<string>) -> bool {
		Terminal.put("Usage: v3c [options] [Virgil file(s)]\n\n");
		Terminal.put("  help       Print this option summary.\n");
		Terminal.put("  version    Print version.\n");
		Terminal.put("  test       Execute tests.\n");
		Terminal.put("  test.st    Execute tests.\n");
		Terminal.put("  test.gc    Print version.\n");
		Terminal.put("  run        Directly run.\n");
		Terminal.put("  profile    Print version.\n");
		Terminal.put("  iprofile   Print version.\n");
		Terminal.put("  profile-depth    Print version.\n");
		Terminal.put("  icoverage    Print version.\n");
		Terminal.put("  multiple    Print version.\n");
		Terminal.put("  target    Print version.\n");
		Terminal.put("  output    Print version.\n");
		Terminal.put("  program-name    Print version.\n");
		Terminal.put("  target    Print version.\n");
		Terminal.put("  target    Print version.\n");
		Terminal.put("  target    Print version.\n");
		Terminal.put("  target    Print version.\n");
		Terminal.put("  target    Print version.\n");
		Terminal.put("  target    Print version.\n");
		return false;
	}

Then I decided that a better approach would be to add the description directly to the BasicOptions by extending it like:

class AeneasOptions extends BasicOptions {
	var descriptions  = Strings.newMap<string>();

	def newIntOption(name: string, val: int, description : string) -> Option<int> {
		var rc = super(name, val);
		descriptions[name] = description;
		return rc;
	}
	def newSizeOption(name: string, val: u32, description : string) -> Option<u32> {
		var rc = super(name, val);
		descriptions[name] = description;
		return rc;
	}
	def newBoolOption(name: string, val: bool, description : string) -> Option<bool> {
		var rc = super(name, val);
		descriptions[name] = description;
		return rc;
	}
	def newStringOption(name: string, val: string, description : string) -> Option<string> {
		var rc = super(name, val);
		descriptions[name] = description;
		return rc;
	}
	def newOption<T>(name: string, val: T, parseFun: string -> T, description : string) -> Option<T> {
		var rc = super<T>(name, val, parseFun);
		descriptions[name] = description;
		return rc;
	}
}

But then I'm getting this errors:

virgil run Aeneas.v3
Aeneas.v3:8:26: keyword "super" cannot be used as identifier
		var rc = super(name, val);
                         ^^^^^
Aeneas.v3:13:26: keyword "super" cannot be used as identifier
		var rc = super(name, val);
                         ^^^^^
Aeneas.v3:18:26: keyword "super" cannot be used as identifier
		var rc = super(name, val);
                         ^^^^^
Aeneas.v3:23:26: keyword "super" cannot be used as identifier
		var rc = super(name, val);
                         ^^^^^
Aeneas.v3:28:26: keyword "super" cannot be used as identifier
		var rc = super<T>(name, val, parseFun);
                         ^^^^^

Then I searched for other code doing something similar I could not find other usage of super that was not after new, how can we call inherited methods in virgil ?

I also tried this:

class AeneasOptions extends BasicOptions {
	var descriptions  = Strings.newMap<string>();

	def newIntOption(name: string, val: int, description : string) -> Option<int> {
		var rc = super.newIntOption(name, val);
		descriptions[name] = description;
		return rc;
	}
	def newSizeOption(name: string, val: u32, description : string) -> Option<u32> {
		var rc = super.newSizeOption(name, val);
		descriptions[name] = description;
		return rc;
	}
	def newBoolOption(name: string, val: bool, description : string) -> Option<bool> {
		var rc = super.newBoolOption(name, val);
		descriptions[name] = description;
		return rc;
	}
	def newStringOption(name: string, val: string, description : string) -> Option<string> {
		var rc = super.newStringOption(name, val);
		descriptions[name] = description;
		return rc;
	}
	def newOption<T>(name: string, val: T, parseFun: string -> T, description : string) -> Option<T> {
		var rc = super.newOption<T>(name, val, parseFun);
		descriptions[name] = description;
		return rc;
	}
}

@mingodad
Copy link
Contributor Author

mingodad commented Jan 4, 2022

It seems that there is no function overload ?

class AeneasOptions extends BasicOptions {
	var descriptions  = Strings.newMap<string>();

	def newIntOption(name: string, val: int, description : string) -> Option<int> {
		var rc = BasicOptions.newIntOption(name, val);
		descriptions[name] = description;
		return rc;
	}
	def newSizeOption(name: string, val: u32, description : string) -> Option<u32> {
		var rc = newSizeOption(name, val);
		descriptions[name] = description;
		return rc;
	}
	def newBoolOption(name: string, val: bool, description : string) -> Option<bool> {
		var rc = newBoolOption(name, val);
		descriptions[name] = description;
		return rc;
	}
	def newStringOption(name: string, val: string, description : string) -> Option<string> {
		var rc = newStringOption(name, val);
		descriptions[name] = description;
		return rc;
	}
	def newOption<T>(name: string, val: T, parseFun: string -> T, description : string) -> Option<T> {
		var rc = newOption<T>(name, val, parseFun);
		descriptions[name] = description;
		return rc;
	}
}

Output:

[aeneas/src/main/Aeneas.v3 @ 7:13] InheritanceError: method signature (Array<byte>, int, Array<byte>) -> Option<int> cannot override (Array<byte>, int) -> Option<int>
	def newIntOption(name: string, val: int, description : string) -> Option<int> {
            ^
[aeneas/src/main/Aeneas.v3 @ 12:13] InheritanceError: method signature (Array<byte>, u32, Array<byte>) -> Option<u32> cannot override (Array<byte>, u32) -> Option<u32>
	def newSizeOption(name: string, val: u32, description : string) -> Option<u32> {
            ^
[aeneas/src/main/Aeneas.v3 @ 17:13] InheritanceError: method signature (Array<byte>, bool, Array<byte>) -> Option<bool> cannot override (Array<byte>, bool) -> Option<bool>
	def newBoolOption(name: string, val: bool, description : string) -> Option<bool> {
            ^
[aeneas/src/main/Aeneas.v3 @ 22:13] InheritanceError: method signature (Array<byte>, Array<byte>, Array<byte>) -> Option<Array<byte>> cannot override (Array<byte>, Array<byte>) -> Option<Array<byte>>
	def newStringOption(name: string, val: string, description : string) -> Option<string> {
            ^
[aeneas/src/main/Aeneas.v3 @ 27:13] InheritanceError: method signature (Array<byte>, T, Array<byte> -> T, Array<byte>) -> Option<T> cannot override (Array<byte>, T, Array<byte> -> T) -> Option<T>
	def newOption<T>(name: string, val: T, parseFun: string -> T, description : string) -> Option<T> {
            ^
[aeneas/src/main/Aeneas.v3 @ 4:7] TypeError: super constructor requires 1 argument and found 0
class AeneasOptions extends BasicOptions {
      ^
[aeneas/src/main/Aeneas.v3 @ 8:52] TypeError: call requires type BasicOptions and found Array<byte>
		var rc = BasicOptions.newIntOption(name, val);
                                                   ^^^^
[aeneas/src/main/Aeneas.v3 @ 8:58] TypeError: call requires type Array<byte> and found int
		var rc = BasicOptions.newIntOption(name, val);
                                                         ^^^
[aeneas/src/main/Aeneas.v3 @ 8:61] TypeError: call requires 3 arguments and found 2
		var rc = BasicOptions.newIntOption(name, val);
                                                            ^
[aeneas/src/main/Aeneas.v3 @ 13:49] TypeError: call requires 3 arguments and found 2
		var rc = newSizeOption(name, val);
                                                ^
[aeneas/src/main/Aeneas.v3 @ 18:49] TypeError: call requires 3 arguments and found 2
		var rc = newBoolOption(name, val);
                                                ^
[aeneas/src/main/Aeneas.v3 @ 23:51] TypeError: call requires 3 arguments and found 2
		var rc = newStringOption(name, val);
                                                  ^
[aeneas/src/main/Aeneas.v3 @ 28:58] TypeError: call requires 4 arguments and found 3
		var rc = newOption<T>(name, val, parseFun);

@titzer
Copy link
Owner

titzer commented Jan 4, 2022

Oh neat, it looks like you're adding some help to the options, which is something I was planning on doing eventually.

You are correct on both of the above. There is no calling of overridden (super) methods, nor overloading of methods in Virgil. The latter becomes ambiguous in the case of delegate methods and the former was mostly a design choice on my part to avoid the fragile base class problem. It's occasionally useful, and super would probably be the right syntactic choice, to match Java. But I don't think I'll do that soon.

There are lots of ways around this, e.g. you could just add a suffix D to the method names. However I actually prefer that utility methods take more arguments (e.g. BasicOptions always takes a description, and allows null) and if you want a shorter version, you can use a locally-bound partially applied function, like so:

def options = new BasicOptions();
def newBoolOpt = options.newBoolOption(_, _, null);

def MYOPTION = newBoolOpt("myoption", false);
def OTHEROPT = newBoolOpt("other", true);

I kind of like this second pattern, because it keeps the shorthand local to the scope/file, instead of littering a utility class with many versions of a method that mostly do the same thing but just supply a default value of an argument. Plus, you can partially apply with any arguments you want, including the receiver, making it really really short.

titzer added a commit that referenced this issue Apr 7, 2022
@titzer
Copy link
Owner

titzer commented Apr 7, 2022

@mingodad

I've added a more complete EBNF to the doc/ directory. I didn't fill out the tokens for string, integer, and float constants, and it doesn't respect operator precedence, but it is pretty close now.

@mingodad
Copy link
Contributor Author

mingodad commented Apr 7, 2022

Trying to use your EBNF at https://www.bottlecaps.de/rr/ui report some small mistakes that I fixed and you can see on the diff shown bellow:

diff -u virgil-grammar.ebnf virgil-grammar2.ebnf 
--- virgil-grammar.ebnf	2022-04-07 10:22:42.890971279 +0200
+++ virgil-grammar2.ebnf	2022-04-07 10:21:17.379177181 +0200
@@ -29,7 +29,7 @@
 DefMethod ::= TK_private? TK_def (IndexMethod | Method)
 Method ::=  IdentParam "(" VarParamDecls? ")" ( "->" ( TK_this | TypeRef ) )? MethodBody
 
-MethodBody := ";" | BlockStmt
+MethodBody ::= ";" | BlockStmt
 ExportDecl ::=  TK_export ( DefMethod | ( TK_string | Ident ) ( "=" SymbolParam )? ";" )
 Symbol ::= TK_ident ("." TK_ident)*
 SymbolParam ::= IdentParam ("." IdentParam)*
@@ -52,10 +52,10 @@
 ForStmt ::=  TK_for "(" VarDecl ( "<" Expr | TK_in Expr | ";" Expr ";" Expr ) ")" Stmt
 ExprStmt ::=  Expr ";"
 
-Expr := SubExpr (Assign Expr)?
-SubExpr := InExpr (Infix InExpr)?
-InExpr := Term TermSuffix*
-TermSuffix := (MemberSuffix | ApplySuffix | IndexSuffix | IncOrDec)
+Expr ::= SubExpr (Assign Expr)?
+SubExpr ::= InExpr (Infix InExpr)?
+InExpr ::= Term TermSuffix*
+TermSuffix ::= (MemberSuffix | ApplySuffix | IndexSuffix | IncOrDec)
 MemberSuffix ::=  "." ( IdentParam | TK_int | Operator )
 ApplySuffix ::= "(" Exprs? ")"
 IndexSuffix ::= "[" Exprs "]"
@@ -64,8 +64,8 @@
 ArrayExpr ::=  "[" Exprs? "]"
 ParamExpr ::=  "_"
 IfExpr ::= TK_if "(" Expr "," Expr ( "," Expr )? ")"
-Literal ::= Const | TK_this 
-Const ::= TK_char | TK_string | TK_int | TK_float | TK_true | TK_false | TK_null 
+Literal ::= Const | TK_this
+Const ::= TK_char | TK_string | TK_int | TK_float | TK_true | TK_false | TK_null
 
 IncOrDec ::=  "++" | "--"
 Operator ::= Infix | CastOrQuery | "-" | "~" | "[]" | "[]="
@@ -109,4 +109,4 @@
 TK_char ::=
 TK_int ::=
 TK_float ::=
-TK_string ::=
+TK_string ::=
\ No newline at end of file

@titzer
Copy link
Owner

titzer commented Apr 7, 2022

Thanks, fixed.

@mingodad
Copy link
Contributor Author

mingodad commented Apr 7, 2022

Not at all !
Notice that the EBNF grammar accepted at https://www.bottlecaps.de/rr/ui also has a parser generator for it here https://www.bottlecaps.de/rex/ and if you add the missing tokens you can get a standalone parser in several programming languages (C++, C#, Haxe, Java, JavaScript, TypeScript, Scala, XQuery, XSLT, XML, click the "config" check box to see the options) that output a parser tree in xml.

@mingodad
Copy link
Contributor Author

mingodad commented Jul 2, 2022

I'm using/cooperating with the development of chpeg (https://github.com/ChrisHixon/chpeg) and cpp-peg-lib (https://github.com/yhirose/cpp-peglib) and as exercise I'm trying to create a peg parser based on https://github.com/titzer/virgil/blob/master/doc/virgil-grammar.ebnf and got one that although it doesn't parse all (*.v3) so far (it's missing somenthing to recognize >.new) it does parse some of then including this big one https://github.com/titzer/virgil/blob/master/lib/asm/x86-64/X86_64Assembler.v3 .

I think that it can help to play with developing/debugging new/existing grammar syntax constructs due to it's interactive playgrounds:

CHPEG -> https://chrishixon.github.io/chpeg/playground/
CPP-PEGLIB -> https://yhirose.github.io/cpp-peglib/

Copy and paste the grammar shown bellow on the Grammar editor and any virgil source code on Source Code editor and then click the Parse button on the upper right corner to get an AST if it parses without errors.

Virgil peg grammar:

# Adapted from https://github.com/titzer/virgil/blob/master/doc/virgil-grammar.ebnf

Virgil <-  _ ToplevelDecl* !.
ToplevelDecl <-  ClassDecl / ComponentDecl / VariantDecl / EnumDecl / ExportDecl / VarMember / DefMember

ClassDecl <- "class" WB IdentParam ( "(" _ VarParamDecls? ")" _ )? ( "extends" WB TypeRef TupleExpr? )? "{" _ Member* "}" _
ComponentDecl <- ("import" WB)? "component" WB IDENTIFIER "{" _ Member* "}" _
VariantDecl <-  "type" WB IdentParam ( "(" _ ParamDecls? ")" _ )? "{" _ VariantMember* "}" _
EnumDecl <-  "enum" WB IDENTIFIER ( "(" _ ParamDecls? ")" _ )? "{" _ EnumCases? "}" _

Member <-  DefMember / NewMember / VarMember
VariantMember <- DefMethod / VariantCase
VarMember <-  ("private" WB)? "var" WB VarDecls ";" _
DefMember <-  ("private" WB)? "def" WB ((("var" WB)? VarDecls ";" _) / IndexMethod / Method)
NewMember <-  "new" WB "(" _ NewParamDecls? ")" _ ( (":" _)? "super" WB TupleExpr )? BlockStmt
DefMethod <- ("private" WB)? "def" WB (IndexMethod / Method)
VariantCase <-  "case" WB IDENTIFIER ( "(" _ ParamDecls? ")" _ )? ( ";" _ / "{" _ DefMethod* "}" _)
EnumCase <-  IDENTIFIER ( "(" _ ( Expr ( "," _ Expr )* )? ")" _ )?
EnumCases <- EnumCase ( "," _ EnumCase )* (',' _)?

VarParamDecl <-  ("var" WB)? IDENTIFIER ":" _ TypeRef
VarParamDecls <- VarParamDecl ("," _ VarParamDecl)*
ParamDecl <-  IDENTIFIER ":" _ TypeRef
ParamDecls <- ParamDecl ("," _ ParamDecl)*
NewParamDecl <- IDENTIFIER (":" _ TypeRef)?
NewParamDecls <- NewParamDecl ("," _ NewParamDecl)*
IdentParam <-  (IDENTIFIERP TypeArgs ">" _) / IDENTIFIER
TypeRef <-  ( "(" _ TypeArgs? ")" _ / IdentParam ( "." _ IdentParam )* ) ( "->" _ TypeRef )*
TypeArgs <- TypeRef ("," _ TypeRef)*

VarDecl <- IDENTIFIER ((":" _ TypeRef ("=" _ Expr)?) / ("=" _ Expr))?
VarDecls <- VarDecl ( "," _ VarDecl )*
IndexMethod <- IdentParam? "[" _ VarParamDecls? "]" _ ( "=" _ ParamDecl / "->" _ TypeRef ) MethodBody
Method <-  IdentParam "(" _ VarParamDecls? ")" _ ( "->" _ ( "this" WB / TypeRef ) )? MethodBody

MethodBody <- ";" _ / BlockStmt
ExportDecl <-  "export" WB ( DefMethod / ( STRING / IDENTIFIER ) ( "=" _ SymbolParam )? ";" _ )
Symbol <- IDENTIFIER ("." _ IDENTIFIER)*
SymbolParam <- IdentParam ("." _ IdentParam)*

BlockStmt <- "{" _ Stmt* "}" _
Stmt <-  BlockStmt / EmptyStmt / IfStmt / WhileStmt / MatchStmt / VarStmt / DefStmt / BreakStmt / ContinueStmt / ReturnStmt / ForStmt / ExprStmt
EmptyStmt <-  ";" _
IfStmt <-  "if" WB "(" _ Expr ")" _ Stmt ( "else" WB Stmt )?
WhileStmt <-  "while" WB "(" _ Expr ")" _ Stmt
MatchStmt <-  "match" WB "(" _ Expr ")" _ "{" _ ( MatchCase MatchCase* )? "}" _ ( "else" WB Stmt )?
MatchCase <-  ("_" _ / (MatchPattern ( "," _ MatchPattern )*)) "=>" _ Stmt
MatchPattern <-  IdTypePattern / SymbolPattern / Const
IdTypePattern <-  IDENTIFIER ":" _ TypeRef
SymbolPattern <- Symbol ( "(" _ ( IDENTIFIER ( "," _ IDENTIFIER )* )? ")" _ )?
VarStmt <-  "var" WB VarDecls ";" _
DefStmt <-  "def" WB VarDecls ";" _
BreakStmt <-  "break" WB ";" _
ContinueStmt <-  "continue" WB ";" _
ReturnStmt <-  "return" WB Expr? ";" _
ForStmt <-  "for" WB "(" _ VarDecl ( "<" _ Expr / "in" WB Expr / ";" _ Expr ";" _ Expr ) ")" _ Stmt
ExprStmt <-  Expr ";" _

Expr <- SubExpr (Assign Expr)?
SubExpr <- InExpr (Infix InExpr)*
InExpr <- Term TermSuffix*
TermSuffix <- (MemberSuffix / ApplySuffix / IndexSuffix / IncOrDec)
MemberSuffix <-  "." _ ( IdentParam / INTEGER / Operator )
ApplySuffix <- "(" _ ExprList? ")" _
IndexSuffix <- "[" _ ExprList "]" _
Term <- (IncOrDec / ("-" / "!" / "~") _)? (ParamExpr / Literal / ArrayExpr / TupleExpr / IfExpr)
TupleExpr <-  "(" _ ExprList? ")" _
ArrayExpr <-  "[" _ ExprList? "]" _
ParamExpr <-  "_" _
IfExpr <- "if" WB "(" _ Expr "," _ Expr ( "," _ Expr )? ")" _
Literal <- Const / IdentParam / "this" WB
Const <- CHAR / STRING / FLOAT / INTEGER / ("true" / "false" / "null") WB
ExprList <- Expr ("," _ Expr)*

IncOrDec <-  ("++" / "--") _
Operator <- Infix / CastOrQuery / ("-" / "~" / "[]" / "[]=") _
CastOrQuery <- ("!" / "?") _ ( "<" _ TypeArgs ">" _ )?
Assign <- ("=" / "<<=" / ">>=" / "|=" / "&=" / "<<<=" / ">>>=" / "+=" / "-=" / "*=" / "/=" / "%=" / "^=") _
Infix <-  ("==" / "!=" / "||" / "&&" / "<=" / ">=" / "|" / "&" / "<<<" / "<<" / "<" / ">>>" / ">>" / ">" / "+" / "-" / "*" / "/" / "%" / "^") _

IDENT_START <- [a-zA-Z]
IDENT_CONT <- [a-zA-Z0-9_]
IDENTIFIER <- <IDENT_START IDENT_CONT*> WB
IDENTIFIERP <- <IDENT_START IDENT_CONT*> "<" _
CHAR <- <"'" (!"'" (HEXCHAR / ESCAPE / PRINTABLE / .))  "'"> _
INTEGER <- <(("0"[xX] [a-fA-F0-9_]+ / "0b" [01]+ / [0] / [-]? ([1-9][0-9_]*)) [uU]? [lL]?)> WB
FLOAT <- <([-]? ([0] / ([1-9][0-9_]*)) ('.' [0-9_]*)? ([eE] [+-]? ([0] / [1-9][0-9]*))? [fFdD]?)> WB
STRING <- <'"' (!'"' ( HEXCHAR / ESCAPE / PRINTABLE / . ))* '"'> _
HEXCHAR <- "\\"[xX][0-9A-Fa-f][0-9A-Fa-f]
PRINTABLE <- [A-Za-z0-9`~!@#$%^&*()-_=+\[{\]};:,<.>/?]
ESCAPE <- [\\][rnbt'"\\]

LINEBREAK <- '\n' '\r'? / '\r' '\n'?
COMMENTS <-
    "//" (!LINEBREAK .)* LINEBREAK?
    / "/*" (!"*/" (LINEBREAK / .))* "*/"

WS <- [ \t]+ / LINEBREAK

# For chpeg we use {I} to ignore/hide on the AST
#WB {I} <- !IDENT_CONT _
#_ {I} <- (WS / COMMENTS)*

# For cpp-peglib we use ~rule to ignore/hide on the AST
~WB  <- !IDENT_CONT _
~_  <- (WS / COMMENTS)*

@mingodad
Copy link
Contributor Author

mingodad commented Jul 2, 2022

Notice the comments at the bottom of the grammar shown above to swap the comments depending if you want to use the chpeg or cpp-peglib playground (as it's now it works on cpp-peglib).

@mingodad
Copy link
Contributor Author

mingodad commented Jul 2, 2022

I just found the problem to parse >.new and updated the grammar shown above and it can now parse several more *.v3 files.

@mingodad
Copy link
Contributor Author

mingodad commented Jul 2, 2022

The same grammar is now listed here ChrisHixon/chpeg#20 (comment) like other ones I've done/converted/adapted so far.

Any feedback is welcome !

@mingodad
Copy link
Contributor Author

mingodad commented Jul 2, 2022

I did several other small fixes to the grammar shown above (already updated again) and now it parses almost all *.v3 and even found some syntax problems like the ones shown bellow (missing comma in enums).

virgil/wizard-engine/src/jawa/JawaOpcodes.v3:98:3 unexpected syntax
virgil/wizard-engine/src/engine/Interpreter.v3:74:3 unexpected syntax

Attached is the same grammar that works with https://github.com/mingodad/peg and generate a standalone parser used to test on all *.v3 files.

virgil-peg.zip

@mingodad
Copy link
Contributor Author

mingodad commented Jul 2, 2022

And here is an EBNF generated from the above peg grammar (to be viewed at https://www.bottlecaps.de/rr/ui):

Virgil ::=
	 _ ToplevelDecl* _NOT_  .

ToplevelDecl ::=
	 ClassDecl
	| ComponentDecl
	| VariantDecl
	| EnumDecl
	| ExportDecl
	| VarMember
	| DefMember

ClassDecl ::=
	 "class" WB IdentParam ( "(" _ VarParamDecls? ")" _ )? ( "extends" WB TypeRef TupleExpr? )? "{" _ Member* "}" _

ComponentDecl ::=
	 ( "import" WB )? "component" WB IDENTIFIER "{" _ Member* "}" _

VariantDecl ::=
	 "type" WB IdentParam ( "(" _ ParamDecls? ")" _ )? "{" _ VariantMember* "}" _

EnumDecl ::=
	 "enum" WB IDENTIFIER ( "(" _ ParamDecls? ")" _ )? "{" _ EnumCases? "}" _

Member ::=
	 DefMember
	| NewMember
	| VarMember

VariantMember ::=
	 DefMethod
	| VariantCase

VarMember ::=
	 ( "private" WB )? "var" WB VarDecls ";" _

DefMember ::=
	 ( "private" WB )? "def" WB ( ( ( "var" WB )? VarDecls ";" _ ) | IndexMethod | Method )

NewMember ::=
	 "new" WB "(" _ NewParamDecls? ")" _ ( ( ":" _ )? "super" WB TupleExpr )? BlockStmt

DefMethod ::=
	 ( "private" WB )? "def" WB ( IndexMethod | Method )

VariantCase ::=
	 "case" WB IDENTIFIER ( "(" _ ParamDecls? ")" _ )? ( ( ";" _ ) | ( "{" _ DefMethod* "}" _ ) )

EnumCase ::=
	 IDENTIFIER ( "(" _ ( Expr ( "," _ Expr )* )? ")" _ )?

EnumCases ::=
	 EnumCase ( "," _ EnumCase )* ( ',' _ )?

VarParamDecl ::=
	 ( "var" WB )? IDENTIFIER ":" _ TypeRef

VarParamDecls ::=
	 VarParamDecl ( "," _ VarParamDecl )*

ParamDecl ::=
	 IDENTIFIER ":" _ TypeRef

ParamDecls ::=
	 ParamDecl ( "," _ ParamDecl )*

NewParamDecl ::=
	 IDENTIFIER ( ":" _ TypeRef )?

NewParamDecls ::=
	 NewParamDecl ( "," _ NewParamDecl )*

IdentParam ::=
	 ( IDENTIFIERP TypeArgs ">" _ )
	| IDENTIFIER

TypeRef ::=
	 ( ( "(" _ TypeArgs? ")" _ ) | ( IdentParam ( "." _ IdentParam )* ) ) ( "->" _ TypeRef )*

TypeArgs ::=
	 TypeRef ( "," _ TypeRef )*

VarDecl ::=
	 IDENTIFIER ( ( ":" _ TypeRef ( "=" _ Expr )? ) | ( "=" _ Expr ) )?

VarDecls ::=
	 VarDecl ( "," _ VarDecl )*

IndexMethod ::=
	 IdentParam? "[" _ VarParamDecls? "]" _ ( ( "=" _ ParamDecl ) | ( "->" _ TypeRef ) ) MethodBody

Method ::=
	 IdentParam "(" _ VarParamDecls? ")" _ ( "->" _ ( ( "this" WB ) | TypeRef ) )? MethodBody

MethodBody ::=
	 ( ";" _ )
	| BlockStmt

ExportDecl ::=
	 "export" WB ( DefMethod | ( ( STRING | IDENTIFIER ) ( "=" _ SymbolParam )? ";" _ ) )

Symbol ::=
	 IDENTIFIER ( "." _ IDENTIFIER )*

SymbolParam ::=
	 IdentParam ( "." _ IdentParam )*

BlockStmt ::=
	 "{" _ Stmt* "}" _

Stmt ::=
	 BlockStmt
	| EmptyStmt
	| IfStmt
	| WhileStmt
	| MatchStmt
	| VarStmt
	| DefStmt
	| BreakStmt
	| ContinueStmt
	| ReturnStmt
	| ForStmt
	| ExprStmt

EmptyStmt ::=
	 ";" _

IfStmt ::=
	 "if" WB "(" _ Expr ")" _ Stmt ( "else" WB Stmt )?

WhileStmt ::=
	 "while" WB "(" _ Expr ")" _ Stmt

MatchStmt ::=
	 "match" WB "(" _ Expr ")" _ "{" _ ( MatchCase MatchCase* )? "}" _ ( "else" WB Stmt )?

MatchCase ::=
	 ( ( "_" _ ) | ( MatchPattern ( "," _ MatchPattern )* ) ) "=>" _ Stmt

MatchPattern ::=
	 IdTypePattern
	| SymbolPattern
	| Const

IdTypePattern ::=
	 IDENTIFIER ":" _ TypeRef

SymbolPattern ::=
	 Symbol ( "(" _ ( IDENTIFIER ( "," _ IDENTIFIER )* )? ")" _ )?

VarStmt ::=
	 "var" WB VarDecls ";" _

DefStmt ::=
	 "def" WB VarDecls ";" _

BreakStmt ::=
	 "break" WB ";" _

ContinueStmt ::=
	 "continue" WB ";" _

ReturnStmt ::=
	 "return" WB Expr? ";" _

ForStmt ::=
	 "for" WB "(" _ VarDecl ( ( "<" _ Expr ) | ( "in" WB Expr ) | ( ";" _ Expr ";" _ Expr ) ) ")" _ Stmt

ExprStmt ::=
	 Expr ";" _

Expr ::=
	 SubExpr ( Assign Expr )?

SubExpr ::=
	 InExpr ( Infix InExpr )*

InExpr ::=
	 Term TermSuffix*

TermSuffix ::=
	 MemberSuffix
	| ApplySuffix
	| IndexSuffix
	| IncOrDec

MemberSuffix ::=
	 "." _ ( IdentParam | INTEGER | Operator )

ApplySuffix ::=
	 "(" _ ExprList? ")" _

IndexSuffix ::=
	 "[" _ ExprList "]" _

Term ::=
	 ( IncOrDec | ( ( "-" | "!" | "~" ) _ ) )? ( ParamExpr | Literal | ArrayExpr | TupleExpr | IfExpr )

TupleExpr ::=
	 "(" _ ExprList? ")" _

ArrayExpr ::=
	 "[" _ ExprList? "]" _

ParamExpr ::=
	 "_" _

IfExpr ::=
	 "if" WB "(" _ Expr "," _ Expr ( "," _ Expr )? ")" _

Literal ::=
	 Const
	| IdentParam
	| ( "this" WB )

Const ::=
	 CHAR
	| STRING
	| FLOAT
	| INTEGER
	| ( ( "true" | "false" | "null" ) WB )

ExprList ::=
	 Expr ( "," _ Expr )*

IncOrDec ::=
	 ( "++" | "--" ) _

Operator ::=
	 Infix
	| CastOrQuery
	| ( ( "-" | "~" | "[]" | "[]=" ) _ )

CastOrQuery ::=
	 ( "!" | "?" ) _ ( "<" _ TypeArgs ">" _ )?

Assign ::=
	 ( "=" | "<<=" | ">>=" | "|=" | "&=" | "<<<=" | ">>>=" | "+=" | "-=" | "*=" | "/=" | "%=" | "^=" ) _

Infix ::=
	 ( "==" | "!=" | "||" | "&&" | "<=" | ">=" | "|" | "&" | "<<<" | "<<" | "<" | ">>>" | ">>" | ">" | "+" | "-" | "*" | "/" | "%" | "^" ) _

IDENT_START ::=
	 [a-zA-Z]

IDENT_CONT ::=
	 [a-zA-Z0-9_]

IDENTIFIER ::=
	 IDENT_START IDENT_CONT* WB

IDENTIFIERP ::=
	 IDENT_START IDENT_CONT* "<" _

CHAR ::=
	 "'" ( _NOT_  "'" ( HEXCHAR | ESCAPE | PRINTABLE | . ) ) "'" _

INTEGER ::=
	 ( ( ( "0" [xX] [a-fA-F0-9_]+ ) | ( "0b" [01]+ ) | [0] | ( [-]? ( [1-9] [0-9_]* ) ) ) [uU]? [lL]? ) WB

FLOAT ::=
	 ( [-]? ( [0] | ( [1-9] [0-9_]* ) ) ( '.' [0-9_]* )? ( [eE] [+-]? ( [0] | ( [1-9] [0-9]* ) ) )? [fFdD]? ) WB

STRING ::=
	 '"' ( _NOT_  '"' ( HEXCHAR | ESCAPE | PRINTABLE | . ) )* '"' _

HEXCHAR ::=
	 "\\" [xX] [0-9A-Fa-f] [0-9A-Fa-f]

PRINTABLE ::=
	 [A-Za-z0-9`~!@#$%^&*()-_=+#x5b{#x5d};:,<.>/?]

ESCAPE ::=
	 [\\] [rnbt'"\\]

LINEBREAK ::=
	 ( ( '\n' '\r'? ) | ( '\r' '\n'? ) )

COMMENTS ::=
	 ( "//" ( _NOT_  LINEBREAK . )* LINEBREAK? )
	| ( "/*" ( _NOT_  "*/" ( LINEBREAK | . ) )* "*/" )

WS ::=
	 [ \t]+
	| LINEBREAK

WB ::=
	 _NOT_  IDENT_CONT _

_ ::=
	 ( WS | COMMENTS )*


//Added tokens for railroad generation
_NOT_ ::= '!'
_AND_ ::= '&'

@titzer
Copy link
Owner

titzer commented Jul 2, 2022

This is great! When I have a chance to take a look in more detail I'll try to check in a version of the above.

@mingodad
Copy link
Contributor Author

I just also did a port of CocoR to Typescript/Javascript and added a grammar for virgil as an example that can be used on the online playground https://mingodad.github.io/CocoR-Typescript/playground (use the select on upper middle screen to load it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants