Scanner and Encoder
Definition Scanning is the process of identifying tokens from the
raw text source code of a program.
- Go through the source code and put each token into the “bracket”.
- Sample tokens:
- Keywords
- Identifiers
- Numbers
- Strings
- Comments and whitespaces
- Remember to create enum type for all tokens.
- Narrow down the scope of possible identifiers as we go.
- Backtracking —
unputcfunction. - We have to be rigorous in defining tokens.
- Example: JSON
- To rigorously define it, we need regular expression.
Regular Expression
- Finite REs create potentially enumerable infinite languages.
SLASH + STAR + (NOT STAR | STAR + NOT STAR)* + STAR + SLASH
Finite Automaton
- DFA: Exactly one action.
- NFA
- Could be multiple choices at each step.
- Assigning priorities to different accepting states (tokens)
- RE to NFA: Thompson’s Construction