Linear Regression

  • Ordinary Least Square method (OLS)
    • By setting partial derivative to zero,
    • In the case of straight line, residuals should be randomly distributed around
    • Coefficient of determination measures how much of the variability of has been accounted by the model.
  • Assumptions
    • Normality: residuals are randomly distributed
    • Homoscedasticity: residuals have constant variance, use Q-Q plot for residuals
    • Independence
    • No outliers
  • Box-Cox transformation: transform the dependent variable and stabilize its variance to make it normally distributed.
  • Multicollinearity
    • Measured by Variance Inflation Factor (VIF)
    • Can be dealt with using Ridge and Lasso regressions, which penalizes overfitting.