Empirical Evaluation of EM, K-Means, and BIRCH Algorithms for High-Dimensional Educational Datasets
DOI:
https://doi.org/10.63001/tbs.2026.v21.i02.S.I(2).pp560-565Keywords:
Clustering Algorithms,, Expectation-Maximization (EM), K- Means Clustering,, BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies),, Comparative Analysis, High- Dimensional Data, Dimensionality Reduction,, Curseof Dimensionality Scalability,, Educational Data Mining (EDM),, Learning Analytics, Student ProfilingAbstract
The rapid digitalization of higher education has produced vast repositories of learner data, necessitating
advanced analytical frameworks to identify latent student performance patterns. This research presents
a comparative investigation into four distinct clustering paradigms—partition-based (K-Means), density-
based (DBSCAN), hierarchical (BIRCH), and probabilistic (Expectation-Maximization)—applied to a
multi-dimensional dataset of academic and engagement metrics. Utilizing internal validity indices
including the Silhouette Coefficient and Normalized Information Gain, we demonstrate that the
Expectation-Maximization (EM) algorithm yields the most refined student segments, achieving a Group
Purity (GP) of 0.71.1 This paper delineates the theoretical trade-offs between "hard" and "soft" clustering
assignments and provides a strategic roadmap for institutions to deploy data-driven at-risk intervention
systems.



















