graph LR
i[Inputs]
p[Program]
o[Observer \n Agent]
a[Analysis \n Agent]
i --> p --> o --> a

Design Decisions

  • How much input to generate?
    • Inputs dictate the expense (time and space) and accuracy
  • How to run the program
    • Runtime environment
  • On-line vs. off-line analysis
  • What to instrument?
  • When to instrument?

Program Instrumentation

  • Adding code to the program to capture runtime information.
  • Python
    • sys module, sys.settrace(tracefunc)
    • tracefunc should accept frame, event, and arg
    • tracefunc should return the callback function for the next instruction.

Input Generation

  • Rely on symbolic information.
  • Symbolic state =
    • next instruction to evaluate
    • symbolic store, aka the variables.
    • path constraints, a boolean expression, True if always executed
  • Execute the program while carrying the symbolic state, fork the states if conditions encountered.
  • Implementation
    • ast module in Python, ast.parse() to parse source obtained from inspect.getsource().
import inspect
import ast
import z3
 
def traverse_if_children(children, context, cond):
    previous_len = len(paths)
    for child in children:
        traverse(child, context + [cond])
    if len(paths) == previous_len:
        paths.append(context + [cond])
 
def traverse(node, context):
    if isinstance(node, ast.If):
        cond = ast.unparse(node.test).strip()
        not_cond = f"z3.Not({cond})"
        traverse_if_children(node.body, context, cond)
        traverse_if_children(node.orelse, context, not_cond)
    else:
        for child in ast.iter_child_nodes(node):
            traverse(child, context)
 
 

Dynamic Taint Analysis