Support Vector Machine

Finding a separating hyperplane
- Must correctly classify the data
- The closest points have the greatest distance to it — Maximize the margin width
Advantages
- Insensitive to high dimensionality
- Insensitive to class imbalance
Disadvantages
- Data may not have a clear boundary
- Linear hyperplane doesn’t exist in non-linear data.
sklearn.svm.LinearSVC: a parameter C to dictate tradeoff between margin and correct classification. The larger, the narrower the margin.
Use kernel function to map data points to a higher dimensionality, so that they can be separated. A feature transformation $x \to ϕ (x)$ is needed.
- Polynomial
- Radial basis function (RBF), or Gaussian kernel
- Sigmoid kernel

Problem Definition

For a binary classification task

Notations
- $x_{i} = (x_{1}, x_{2}, \dots, x_{n})$ , where $x \in R^{n}$
- $y_{i} \in {- 1, 1}$
- Derive a function $f : x \to y$ .
- The hyperplane is given as $wx + b = 0$
- Where $w = (w_{1}, w_{2}, \dots, w_{n})$ is a weighted vector and $b$ is the bias.
To use the SVM, we have
- Boundaries are given by $w x^{\pm} + b = \pm 1$
- If $y_{i} = + 1$ , then $wx_{i} + b \geq 1$
- If $y_{i} = - 1$ , then $wx_{i} + b \leq 1$
- i.e. $y_{i} (wx_{i} + b) \geq 1$
Margin width is given by $2/ ∥ w ∥$

min s.t. w^{⊺} w /2 y_{i} (wx_{i} + b) \geq 1 \forall (x_{i}, y_{i})