Skip to content

ONLY KEPT FOR ARTIC CODE VAULT ACHIEVEMENT CODE IS A MESS / C-like language parser (1328 lines) and simple non-fully functional code generator that spits out NASM assembly written in C++.

License

Notifications You must be signed in to change notification settings

ncortiz/Compiler-Test

Repository files navigation

Single-File-Compiler-From-Scratch-Recursive-Descent-LR-and-NASM-asm-gen

Note: This is a very rough implementation of a RD parser, I've just kept it because it was added to the artic code vault 2020, for a better code base see my repository "Calc-".

C-like to NASM-like (kinda works? haven't tested generated code) assembly compiler in one file with combined Lexer-> Recursive Descent Parser->Code Gen. written in C++ with only standard libraries, parsers and a nice command line coloring system! All in only 1328 lines of code!

Grammar:

NUM-LIT: [0-9]+ ['b'|'q']? 'u'?

CHAR-LIT: ''' [\0-\255] '''

typespec: 'unsigned'? ('byte' | 'word' | 'dword' | 'qword' | 'void' ) 'ptr'?

typespec-list: typespec (',' typespec)*

expr-list: expr (',' expr)*

param-list : IDENT ':' typespec (',' IDENT ':' typespec)*

atom-expr: NUM-LIT

	  : CHAR-LIT
	  
	  : IDENT ('++'| '--' | '(' expr-list? ')' | '<' typespec-list '>' )?
	  
	  : ('++' | '--') IDENT
	  
	  : ('-' atom | '~' atom | '!' atom | '&' atom | '*' atom)
	  
	  : '(' expr ')'
	  


mul-div-mod-expr:   atom-expr (('*' | '/' | '%') atom-expr)*

add-sub-expr:       mul-div-mod-expr (('+' | '-') mul-div-mod-expr)*

shift-expr:         add-sub-expr (('<<'|'>>') add-sub-expr)*

lt-gt-lte-gte-expr: shift-expr(('<?' | '>?' | '<=' | '>=') shift-expr)*

ee-ne-expr:         lt-gt-lte-gte-expr(('=' | '!=')  lt-gt-lte-gte-expr)*

bit-and-expr:       ee-ne-expr('&' ee-ne-expr)*

bit-xor-expr:       bit-and-expr('^' bit-and-expr)*

bit-or-expr:        bit-xor-expr('|' bit-xor-expr)*

log-and-expr:       bit-or-expr('&&' bit-or-expr)*

log-or-expr:        log-expr('||' log-and-expr)*

stmt_expr		   : 'ref' expr '=' expr

				   : 'let' IDENT ('='|'=>') expr
				   
				   : 'def' IDENT '(' param-list? ')' ':' typespec '{' expr_block '}'
				   
				   : 'call' expr '(' param-list? ')'
				   
				   : 'ret' expr
				   
         : if '(' expr ')' '{' expr_block '}' ('else' '{' expr_block '}' )?

Summary:

Types are:

byte, word, dword, qword, void, signed/unsigned versions of these and pointer versions of these.

A value that has 'u' at the end of it e.g. 18u is unsigned, if it then has ('b', 'w' or 'q') then it is a byte, word, qword value respectively if it doesn't specify, it's a dword. Lastly if it has a 'p' then it's a constant pointer memory address.

If a value is entered as 'x' with x being an ascii character between 0-255 then the corresponding ascii value will be read into a byte value.

For e.g.:

15u : unsigned dword, 15ub : unsigned byte, 15ubp : unsigned byte pointer,

15b : signed byte, 15bp : signed byte ptr, 'a' : signed byte, '@' : signed byte, ....

Statements:

'if' and 'else' do what you would expect, 'let' allows you to set/define variables, '=' would set and '=>' would define a variable. 'ref' references a variable by pointer basically this in C++: *(ptr) = x; '&' gets the address (returns ptr values). 'def' defines function.

Syntax:

if ( expr ) { body }, if (expr) { body } else { body }, let identifier = expr, let identifier => expr, ref expr = expr, def identifier ( (param-name : param-type)* ) : type { body }, & expr,

Function overloads

When using 'def' to define a function, this function will not be indexed within the symbol table only using its identifier or name but using its complete signature, that is, a name composed of its identifier as well as a list of the types of its parameters in order. (for e.g. a function called 'test' that takes two bytes and a char would be 'test(byte,byte,char)' in the symbol table. Therefore you can have as many functions with the same name as you'd like as long as they have different parameter types or combinations thereof.

Then, in order to call a specific overload you use '<' and '>' as follows:

test(x) : calls test's byte overload with x, test(x) : calls test's char overload with x, test(x) : calls test's (char, byte, dword) overload with x,

Syntax:

function-name < param-type-list >(args);

Finally

Always end a statement with a ';', it doesn't matter if two or more statements are on the same line as long as they each end in ';', (kinda free-form)....

Examples:

This works:

let x => 5; let y => x * 2; let z => x + y * (x * 4 + (2 + x)); let x = 67; def test(x : dword) : dword { let a => x * x; ret a + x; } test(x + z);

if (z >= 2) { let z = 4; }; else { let z = 6; };

def test(x : dword) : dword { if (x >= 2) { ret 1; }; else { ret -1; }; }

test(55);

Kinda works but not entirely (function overloading):

def test(x : byte) : byte { let a => x * x; ret a + x; } call test(x + z); call test(x + z);

Doesn't compile properly (type-checking doesn't work but I'm too lazy to fix it):

def pt => &x; ref pt => 6;

Basically most features work except for pointers (these don't compile/transpile properly to NASM assembly) and function overloading. This is a little personal project I don't intend to finish it but it's interesting nonetheless so I'm posting the full source code.

Note

I now that the source code is organized terribly that is due to this being a one-day project I worked on. It's anything but a serious project. It's meant to be organized kinda like a c-program but using C++ for strings and other features found in there. There's tons stuff you would never do in production code that would lead to very hard to debug code (like this!)

About

ONLY KEPT FOR ARTIC CODE VAULT ACHIEVEMENT CODE IS A MESS / C-like language parser (1328 lines) and simple non-fully functional code generator that spits out NASM assembly written in C++.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages