graph LR
i[Inputs]
p[Program]
o[Observer \n Agent]
a[Analysis \n Agent]
i --> p --> o --> a
Design Decisions
- How much input to generate?
- Inputs dictate the expense (time and space) and accuracy
- How to run the program
- On-line vs. off-line analysis
- What to instrument?
- When to instrument?
Program Instrumentation
- Adding code to the program to capture runtime information.
- Python
sys module, sys.settrace(tracefunc)
tracefunc should accept frame, event, and arg
tracefunc should return the callback function for the next instruction.
- Rely on symbolic information.
- Symbolic state = (stmt,σs,π)
- next instruction to evaluate
- symbolic store, aka the variables.
- path constraints, a boolean expression,
True if always executed
- Execute the program while carrying the symbolic state, fork the states if
conditions encountered.
- Implementation
ast module in Python, ast.parse() to parse source obtained
from inspect.getsource().
import inspect
import ast
import z3
def traverse_if_children(children, context, cond):
previous_len = len(paths)
for child in children:
traverse(child, context + [cond])
if len(paths) == previous_len:
paths.append(context + [cond])
def traverse(node, context):
if isinstance(node, ast.If):
cond = ast.unparse(node.test).strip()
not_cond = f"z3.Not({cond})"
traverse_if_children(node.body, context, cond)
traverse_if_children(node.orelse, context, not_cond)
else:
for child in ast.iter_child_nodes(node):
traverse(child, context)
Dynamic Taint Analysis