Optimization of the K-Means Algorithm Using PCA Dimensionality Reduction for E-Commerce Customer Segmentation

Main Article Content

Mahara Bengi
Syarifah Atika
Chici Rizka Gunawan
Chica Rizka Gunawan

Abstract

The rapid growth of the e-commerce industry in recent years has generated increasingly large and complex volumes of customer data. This data holds strategic potential to be analyzed in order to understand customer behavior patterns and to support data-driven decision-making. This study aims to identify customer segmentation through an unsupervised learning approach using Principal Component Analysis (PCA) and the K-Means algorithm. The dataset used in this research demonstrates good quality with no missing values, making it suitable for further analysis. Initial exploratory findings indicate that Total Spending, Number of Items Purchased, and Average Rating are the most significant variables in representing customer characteristics. The application of PCA successfully reduced data dimensionality while retaining 79.41% of the total variance, thus producing a more concise representation without compromising essential information. The clustering process using K-Means grouped customers into three clearly distinguishable clusters. The first cluster represents customers with high activity levels, the second cluster reflects customers with moderate activity, and the third cluster corresponds to customers with lower engagement intensity. Validation using the Elbow Method and Silhouette Score confirmed that k = 3 is the most optimal number of clusters. Cluster visualizations show strong separation between groups and consistent relationships among variables. This study demonstrates that the combination of PCA and K-Means is effective in producing informative and interpretable customer segmentation. These findings provide a foundation for subsequent analyses and support data-driven decision-making in e-commerce customer management.

Article Details

Section
Articles

References

[1] Purushottam Perapu, “Customer Segmentation Using K-Means Clustering for Personalized Marketing Campaigns,” International Journal of Scientific Research in Computer Science, Engineering and Information Technology, vol. 11, no. 3, pp. 810–815, May 2025, doi: 10.32628/CSEIT25113344.
[2] I. Shah, “Customer Segmentation,” Int J Res Appl Sci Eng Technol, vol. 12, no. 1, pp. 1586–1591, Jan. 2024, doi: 10.22214/ijraset.2024.58144.
[3] Refri Martiansah, Siti Monalisa, Fitriani Muttakin, and Mona Fronita, “Customer Segmentation Analysis Through RFM-D Model and K-Means Algorithm,” Jurnal Sistem Cerdas, vol. 8, no. 1, pp. 1–11, Apr. 2025, doi: 10.37396/jsc.v8i1.504.
[4] V. V. Darma Oktavian, R. Ramadhan, and D. R. Fadhilla, “Segmentasi Pelanggan Berbasis RFM dengan Algoritma K-Means pada Data Transaksi Online Retail,” Jurnal Riset Informatika dan Teknologi Informasi, vol. 2, no. 3, pp. 236–243, Jul. 2025, doi: 10.58776/jriti.v2i3.156.
[5] T. Garg and A. Malik, “Survey on Various Enhanced K-Means Algorithms,” 2014. [Online]. Available: www.ijarcce.com
[6] B. Chong, “K-means clustering algorithm: a brief review,” Academic Journal of Computing & Information Science, vol. 4, no. 5, 2021, doi: 10.25236/AJCIS.2021.040506.
[7] A. Chadha and S. Kumar, “An improved K-Means clustering algorithm: A step forward for removal of dependency on K,” in 2014 International Conference on Reliability Optimization and Information Technology (ICROIT), IEEE, Feb. 2014, pp. 136–140. doi: 10.1109/ICROIT.2014.6798312.
[8] C. Zhang, J. Ou, W. He, H. Huang, G. Cheng, and Y. Gu, “Optimisation Research on K-Means Clustering Algorithm Based on Principal Component Analysis and Percentile Improvement,” in 2024 6th International Conference on Artificial Intelligence and Computer Applications (ICAICA), IEEE, Nov. 2024, pp. 148–153. doi: 10.1109/ICAICA63239.2024.10823007.
[9] A. Jauhari, I. O. Suzanti, D. R. Anamisa, and F. T. Admojo, “PCA-counseled k-means and k-medoids with dimension reduction for improved in determining optimal aid clustering,” Jurnal Ilmiah Kursor, vol. 13, no. 1, pp. 46–55, Jul. 2025, doi: 10.21107/kursor.v13i1.460.
[10] S. A. Mousavian Anaraki, A. Haeri, and F. Moslehi, “A hybrid reciprocal model of PCA and K-means with an innovative approach of considering sub-datasets for the improvement of K-means initialization and step-by-step labeling to create clusters with high interpretability,” Pattern Analysis and Applications, vol. 24, no. 3, pp. 1387–1402, Aug. 2021, doi: 10.1007/s10044-021-00977-x.