Parser

Parses CFG of the language.

Only one way to derive a thing!

LL(1) Parsing

First and Follow Sets

To construct parser for LL(1) grammar, first workout First and Follow sets.

Generate First set
- $First (a) = {a}$
- For $X \to Y_{1} Y_{2} \dots Y_{k}$
  - Add terminals in $First (Y_{1})$ to $First (X)$ .
  - If $Y_{1} \dots Y_{n - 1} \Rightarrow ϵ$ , add terminals in $First (Y_{n})$ to $First (X)$ .
  - If $Y_{1} \dots Y_{k} \Rightarrow ϵ$ , add $ϵ$ to $First (X)$ .
Generate Follow set
- $\text{Follow}(S) = \$$ if$ S$ is the start symbol.
- For $A \to α Bβ$ , add $First (β)$ to $Follow (B)$ , excluding $ϵ$ .
- For $A \to α B$ , or if $ϵ \in First (β)$ , add non-terminals in $Follow (A)$ to $Follow (B)$ .

Warning

In deriving First set, be careful with non-terminals that could potentially produce $ϵ$ !

In deriving Follow set, add Follow set of the LHS to the RHS, not vice versa. CFG goes from left to right only. (Think of the things in the right have no “control” on where it’s placed.)

In deriving First set, go bottom-up; in deriving Follow set, go

top-down.

CFG → LL(1) → Recursive Descent or Table Driven

Recursive Descent Parsing

One simple function for each non-terminal.
Function body follows the RHS of the rule:
- Non-terminals result in a call to another parse function
- Terminals result in considering the next token.
- Expect what’s in the follow set.

int parse_P() { return parse_E() && expect_token(TOKEN_EOF); }
int parse_E() { return parse_T() && parse_E_prime(); }

Table-Driven Parsing

This approach is top-down. Using First set to determine which rule will be applied.

For each rule $A \to α$ (denoted by the number):
- $\forall α \in First (α)$ , add this rule to $T [A, a]$ .
- If $ϵ \in First (α)$ , $\forall b \in Follow (A)$ , including $\$$, add this rule to$ T[A, b]$.

Stack	Input	Action
`P$`	`int * int $`	apply 1: `P => E`
…	…	…
`int T' E' $`	`int * int $`	match `int`
`T' E' $`	`* int $`

Parsing process
- Push top symbol and EOF into stack.
- Each time, look at the left-most token in the Input column.
- Based on the table, decide which rule to apply. Pop and push symbols into the stack accordingly.
- Some times token is matched without rule applied.

LR Parsing

Shift-Reduce Parsing

Stack	input	Action
	`id ( id + id ) $`	shift
`id`	`( id + id ) $`	shift
`id (`	`id + id ) $`	shift
`id ( id`	`+ id ) $`	reduce `T -> id`
`T ( id`	`+ id ) $`	reduce `E -> T`

Shift = consume a token and push it onto the stack
Reduce = apply a rule to the top of the stack, pop tokens from and push non-terminals onto the stack.

In this process, conflicts may occur!

LR(0) Automaton

Represents all the possible rules that are currently under consideration by a shift-reduce parser.
Also known as canonical collection or compact finite state machine of the grammar.
Each state:
- Kernel, for state 0: P -> . E.
- Closure, add all rules on the right side of the dot: E -> . E + T, E -> . T.
- Takes non-terminals or terminals, and transition to other state. Move the dot accordingly.
Conflicts
- Shift-reduce: T -> id . ( E ) versus T -> id .
- Reduce-reduce: S -> id ( E ) . versus E -> id ( E ) .
LR(0) automaton tells us which rules are available, but we don’t know which rules to apply exactly!

SLR Parsing

TL;DR SLR = LR(0) automaton + Follow Set

S stands for Simple. SLR parsing utilizes the Follow set to resolve conflicts.
In short, take reduction to A only when the next token is in Follow(A).
Grammar can be parsed by SLR ⇒ SLR Grammar
SLR Parse Table has Goto and Action sections.

State	`E`	`T`	`+`	`int`
0
1	GOTO	GOTO	ACTION	…
…

To construct the table, if the DOT is followed by:
- Terminal, simply shift to another state according to LR(0) automaton.
- Non-Terminal, goto another state according to LR(0) automaton.
- Nothing, reduce according to the rule.

Stack	Symbols	Input	Action
0		`id ( id + id ) $`	shift 4
0 4	`id`	`( id + id ) $`	shift 5
0 4 5	`id (`	`id + id ) $`	shift 4
0 4 5 4	`id ( id`	`+ id ) $`	reduce `T -> id`
0 4 5 8	`id ( T`	`+ id ) $`

Parsing process
- Shift tokens
  - Push states into stack 1.
  - Push symbols into stack 2.
- If reduce
  - Pop the state at top.
  - Since a non-terminal is created, now use the non-terminal and the state at the top of the stack to perform goto
  - Push the new state to the stack.

Distinction between LL(1) and LR(1) Parsing Each LL(1) parsing state

considers only a single non-terminal, while each LR(1) parsing state considers a large number of possible configurations.

However, SLR is only a proper subset of LR(1)!

LR(1) Parsing

LR(1) automaton augments LR(0) automaton by annotating the lookahead.
Lookahead = set of tokens that can follow the non-terminal, in the current parsing state. — I'm looking for this after this non-terminal.
- For $A \to α \cdot B$ , add new rules like $B \to \cdot γ$ with the same lookahead.
- For $A \to \cdot Bβ$
  - If $β$ cannot be $ϵ$ , the lookahead is $First (β)$ .
  - If $β$ can be $ϵ$ , the lookahead is $First (β)$ plus the lookahead of $A$ .
- When constructing the automaton, we bring the lookahead from one state to the next.

LALR Parsing

LA = Lookahead.
The aforementioned automaton could be too big. In LALR parsing, we can merge states with the same core, by merging their lookahead.

My Vault

Explorer

parser

Parser

LL(1) Parsing

First and Follow Sets

Recursive Descent Parsing

Table-Driven Parsing

LR Parsing

Shift-Reduce Parsing

LR(0) Automaton

SLR Parsing

LR(1) Parsing

LALR Parsing

Graph View

Table of Contents

Backlinks