Skip to content

inofficialamanjha/Lexical-And-Syntax-Analyser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Compiler-Design

Compiler has broadly 6 phases : Lexical Analyser, Syntax Analyser, Semantic Analyser, Intermediate Code Generator, Code Optimizer, Code Generator.

Compiler Flow Chart

Lexical Analyser

Lexical Analyzer : Responsible for generating a stream of tokens <name,value> issued for each lexeme ( chunk of text ). It is also responsible for constructing the symbol Table.

Tokens ( NFA ) : <name,value> ;

eg. <RELOP,EQ>, <num,80>,

For our purpose, we have considered 8 Lexical Categories. Actual Number of lexical categories, may vary and can be quite large.

Lexical Categories

List Order of Lexical Categories with the types of token's issued by them :

Order and Type of Tokens

Here I have implemented NFA's using Switch-case statements. However, the NFA for Keywords is implemented using Trie data-structure.

Also I have implemenented a Symbol Table using Class and a Hash-Map. Symbol table stores the variables and issues token number for variables depending on the row number.

Also, for generating tokens and navigating through the source text file, we have two techniques : Serial and Parallel.

If for a lexemme, no token is issues a Lexical Error is declared !

NFA's Example

  • Relational Operators category : RELOP

RELOP NFA

  • Variable

VAR NFA

  • Keywords

KEYWORD NFA

  • Numbers

NUMBERS NFA

  • Whitespaces

WS NFA

Psuedo code for NFA's

NFA Psedo Code

Serial Navigation

NFA which says yes first wins. ( NFA's are applied in decreasing order ).

NFA Order

Parallel Navigation

Combine All NFA's in one common diagram.

In case of multiple Yes :

  • Go for the longest lexeme in case of Multiple Yes
  • Go for Order, if the length is also same

Syntax Analyser

Parses the string to check if the rule of grammar were followed, and if followed - creates a parse tree for the input string.

Parsing : Constructing a tree for an input string using the CFG ( Context Free Grammar ) Rules. There are two types of parsing, Top-down parsing and bottom-up parsing.

First

To Compute FIRST(X) for all grammar symbols X, apply the following rules until no more terminals or ε symbols can be added to any FIRST set.

First

Follow

To Compute FOLLOW(A) for all nonterminals A, apply the following rules until nothing can be added to any FOLLOW set.

Follow

LL(1) Parser or Predictive Parser.

LL(1) is top-down parser, short for Left to Right Scanning, Left most derivation with one Look ahead.

It starts from the root or the Start Variable S and constructs the tree. Before applyinh make sure that there is no Left Recursion in the Grammar.

It Constructs a parsing table, and the performs a sequence of moves on the input string. Eg.

  • Grammar

Grammar

  • Grammar After Removing Left-Recursion

Left recursion is such that A --> Aα

Replace the left recursive productions by non-left recusrsive Productions.

Left Recursion Removal Rule

For our Grammar :

Grammar without LR

  • Constructing Parsing Table from the Given Grammar

Then, we construct a Predictive Parsing Table :

Parsing Table

  • Sequence of Moves for input id + id * id

Moves made by the predictive parser on our input :

Parse

Outcome : ( $,$ ) : Yes or Accept ( No Syntax Error ).

References

Compilers Priciples, Techniques & Tools - Second Edition : Alfred V. Ahno, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman