-Meas

  • Minimize sum of squared errors (against representative point in cluster)
  • Representative points
    • Centroid = average of points in cluster
    • Medoid = similar to centroid but exists in the dataset
    • Median (take median on different axis to create new object)
    • Mode (similar to above)

Algorithm

  • Algorithms
    • Global optimum: Minimize sum of squared errors (SSE)
    • -Means, Medoids, Medians, Modes
  • Steps
    • Randomly select points as initial centroids
    • Form clusters by assigning points to nearest centroid
    • Move each centroid to the mean of the cluster
    • Repeat until convergence

Validation

  • The Elbow Criterion - determine number of clusters
    • Run -means with different
    • Choose the smallest with a low SSE
    • Plotting the graph and find the elbow

Pros and Cons

  • Pros
    • Easy to compute
    • Easy to interpret
    • Efficient - , where is num of iterations
  • Cons
    • Results sensitive to distance measures
    • Sensitive to initialized centroids
    • Need to specify num of clusters
    • Results sensitive to outliers and noise
    • Not suitable to discover clusters with arbitrary shapes