Scanner and Encoder

Definition Scanning is the process of identifying tokens from the

raw text source code of a program.

  • Go through the source code and put each token into the “bracket”.
  • Sample tokens:
    • Keywords
    • Identifiers
    • Numbers
    • Strings
    • Comments and whitespaces
  • Remember to create enum type for all tokens.
  • Narrow down the scope of possible identifiers as we go.
  • Backtrackingunputc function.
  • We have to be rigorous in defining tokens.
  • Example: JSON
  • To rigorously define it, we need regular expression.

Regular Expression

  • Finite REs create potentially enumerable infinite languages.
  • SLASH + STAR + (NOT STAR | STAR + NOT STAR)* + STAR + SLASH

Finite Automaton

  • DFA: Exactly one action.
  • NFA
    • Could be multiple choices at each step.
    • Assigning priorities to different accepting states (tokens)
  • RE to NFA: Thompson’s Construction