IMDB Movie Rating Clustering (K-Means)

Created By: Muhaddis Farooq

A from-scratch K-Means clustering pipeline (NumPy) that groups IMDb movies by rating patterns and explores genre correlations via unsupervised learning. Preprocesses ratings, runs vectorized assignment/update steps until convergence, and evaluates cluster quality to surface meaningful segments.

The project offerz features such:

Data prep: handling sparsity, normalization/standardization, and one-hot genre encoding.
Custom K-Means core: k-means++ initialization, vectorized distances, inertia/centroid-shift stopping.
Model selection: elbow and silhouette analyses with multiple random restarts.
Cluster interpretation: top genres/titles per cluster and centroid summaries.
Visualization: PCA/t-SNE projections, heatmaps, and cluster distribution charts.
Reproducibility: seeded runs, CSV/NPY exports, and a simple CLI/notebook workflow.

Implements from-scratch K-Means in NumPy to cluster IMDb movies by normalized rating patterns and vectors. Uses k-means++ initialization, vectorized assign/update steps, and elbow/silhouette checks for a stable K. Interprets clusters via top genres/titles and simple visuals (PCA/heatmaps) to reveal genre–rating correlations

Tags: Web