Regular Expression

The Power of Regular Expression

  • Simple to construct.
  • Can express a wide range of languages.
  • Can be converted into Finite Automaton, which leads to efficient execution.

Definition

  • Regular expression denotes — a language of — a string drawn from alphabet .
  • Base case: , then is a regular expression and .
  • Alternation: .
  • Concatenation: is a string in followed by a string in .
  • Kleene closure: is string in concatenated zero or more times.

There are hundreds of notations to help us build a regular expression,

but all we need to construct them is one base case and three inductive rules!

Cases

  • C-Style comments
    • /\* [^*]* \* ( \* | [^*/] [^*]* \* )* /
    • Spaces added for readability.
    • A way to construct: work out NFA first, then use
  • Another example given by regex101
    • (\/\*.*?\*\/?)|(([ ]+)?\/\/.*?$)

Regular Expression to FA

  • Using Thompson’s construction to convert REs to NFAs.
    • Any RE has an start state and accept state, takes characters in between.
    • Connecting the start state and accept state between REs using transitions.
  • Convert NFAs to DFAs using subset construction.
---
title: Concatenation to NFA
---

stateDiagram-v2
    direction LR
    [*] --> A   : epsilon
      A --> B   : epsilon
      B --> [*] : epsilon
    state A {
        direction LR
        [*] --> [*] : A
    }
    state B {
        direction LR
        [*] --> [*] : B
    }
---
title: Alternation to NFA
---

stateDiagram-v2
    direction LR
    state fork <<fork>>
    state join <<join>>
    [*] --> fork
    fork --> A : epsilon
    fork --> B : epsilon
    A --> join : epsilon
    B --> join : epsilon
    join --> [*]
    state A {
        direction LR
        [*] --> [*] : A
    }
    state B {
        direction LR
        [*] --> [*] : B
    }
---
title: Kleene Closure to NFA
---

stateDiagram-v2
    direction LR
    state fork <<fork>>
    state join <<join>>
    [*] --> fork
    fork --> A : epsilon
    A --> join : epsilon
    join --> [*]
    fork --> join : epsilon
    join --> fork : epsilon
    state A {
        direction LR
        [*] --> [*] : A
    }