A from-scratch K-Means clustering pipeline (NumPy) that groups IMDb movies by rating patterns and explores genre correlations via unsupervised learning. Preprocesses ratings, runs vectorized assignment/update steps until convergence, and evaluates cluster quality to surface meaningful segments.
The project offerz features such:
- Data prep: handling sparsity, normalization/standardization, and one-hot genre encoding.
- Custom K-Means core: k-means++ initialization, vectorized distances, inertia/centroid-shift stopping.
- Model selection: elbow and silhouette analyses with multiple random restarts.
- Cluster interpretation: top genres/titles per cluster and centroid summaries.
- Visualization: PCA/t-SNE projections, heatmaps, and cluster distribution charts.
- Reproducibility: seeded runs, CSV/NPY exports, and a simple CLI/notebook workflow.
Implements from-scratch K-Means in NumPy to cluster IMDb movies by normalized rating patterns and vectors. Uses k-means++ initialization, vectorized assign/update steps, and elbow/silhouette checks for a stable K. Interprets clusters via top genres/titles and simple visuals (PCA/heatmaps) to reveal genre–rating correlations
Tags: Web


