Analysis of Factors Influencing Review Helpfulness on Online Fashion Platforms - A Machine Learning Approach Using NMF, XGBoost, and SHAP -

doi:10.34227/tjocc.2026..36.315

All Issue

2026 Vol.36 Preview Page

Analysis of Factors Influencing Review Helpfulness on Online Fashion Platforms - A Machine Learning Approach Using NMF, XGBoost, and SHAP - 온라인 패션 플랫폼 이용자 리뷰의 유용성 평가 요인 분석 - NMF, XGBoost, SHAP을 활용한 머신러닝 접근법 -: 김의환¹ · 신민호² · 박경서³ · 문지은⁴ · 황용석⁵
Kim, Euihwan¹ · Shin, Minho² · Park, Kyoungseo³ · Moon, Jieun⁴ · Hwang, Yongsuk⁵; ¹제1저자, 건국대학교 문화콘텐츠커뮤니케이션학과 석사
²공동저자, 건국대학교 문화콘텐츠커뮤니케이션학과 석사
³공동저자, 건국대학교 미디어커뮤니케이션학과 석사
⁴공동저자, 건국대학교 미디어커뮤니케이션학과 석사과정
⁵교신저자, 건국대학교 미디어커뮤니케이션학과 교수

¹Master’s, Department of Media and Communication, Konkuk University
²Master’s, Department of Media and Communication, Konkuk University
³Master’s, Department of Media and Communication, Konkuk University
⁴Master’s Candidate, Department of Media and Communication, Konkuk University
⁵Professor, Department of Media and Communication, Konkuk University

30 April 2026. pp. 315~348

PDF

Abstract

This study investigates the determinants of perceived review helpfulness on online fashion platforms, with a focus on overcoming the structural limitations of current vote-based review ranking systems, namely the Matthew effect and vulnerability to manipulation. Using 48,878 reviews collected from Musinsa, South Korea’s largest online fashion platform, we construct a machine learning-based prediction model by combining Non-negative Matrix Factorization (NMF) topic modeling with XGBoost, and employ SHAP (SHapley Additive exPlanations) to interpret model outputs. Results indicate that: (1) a hybrid undersampling method combining k-RNN and OCSVM, paired with direct categorical variable handling, yields the best predictive performance; (2) review format (style/photo/text), text length, exposure duration, user level, and gender are significant predictors of helpfulness; (3) style reviews featuring wearer photos far outperform text-only reviews; and (4) topics related to fit/comfort and size/purchase motivation positively contribute to perceived usefulness. These findings suggest that a multi-dimensional, content-based algorithm can more robustly identify genuinely useful reviews than simple vote aggregation. This study contributes theoretically by validating the complementary roles of central and peripheral routes (ELM) in review helpfulness evaluation, and methodologically by demonstrating the efficacy of NMF with ensemble learning for short-text data analysis in a Korean-language e-commerce context.

Keywords

Online Fashion Platform

Musinsa

Review Helpfulness

Machine Learning

Topic Modeling

XGBoost

SHAP

User-Generated Content (UGC)

온라인 패션 플랫폼에서 소비자 리뷰는 정보 비대칭성을 완화하고 구매 의사결정을 지원하는 핵심 정보원으로 기능한다. 그러나 현행 단순 누적 투표 기반 리뷰 정렬 시스템은 마태 효과(Matthew effect)와 외부 조작에 구조적으로 취약하다는 한계를 지닌다. 본 연구는 이러한 한계를 극복하고자 국내 최대 온라인 패션 플랫폼인 무신사의 리뷰 데이터(N=48,878)를 수집하여 머신러닝 기반의 리뷰 유용성 예측 모델을 구축하고 주요 영향 요인을 체계적으로 규명하였다.
분석 방법으로는 짧은 텍스트 데이터에 적합한 비음수행렬분해(NMF) 기반 토픽모델링과 비선형적 관계 분석에 탁월한 XGBoost를 결합하였으며, SHAP (SHapley Additive exPlanations)을 통해 모델 예측 결과를 해석하였다. 분석 결과, 첫째, 데이터 불균형 해소를 위해 k-RNN과 OCSVM을 결합한 하이브리드 언더샘플링 기법을 적용하고 범주형 변수를 원핫 인코딩 없이 처리했을 때 예측 성능이 최우수한 것으로 나타났다. 둘째, SHAP 분석 결과 리뷰 형식(카테고리), 텍스트 길이, 노출 기간, 작성자 레벨, 성별이 리뷰 유용성에 유의한 영향을 미쳤다. 특히 착용 사진을 포함한 스타일 리뷰는 텍스트 리뷰에 비해 압도적으로 높은 유용성 평가를 받았으며, 토픽 측면에서는 ‘기장 및 착용감’, ‘사이즈 및 구매 동기’와 같이 제품의 핵심 속성을 구체적으로 기술한 내용이 유용성 평가에 긍정적인 영향을 미쳤다.
본 연구는 기존 리뷰 정렬의 구조적 한계를 보완하는 다차원적 평가 알고리즘을 제안하였다는 점에서 실무적 의의를 가지며, 단문 텍스트 분석에 있어 NMF와 앙상블 기법의 유효성을 입증하였다는 점에서 방법론적 기여를 제공한다. 또한 온라인 플랫폼의 리뷰 정렬 알고리즘이 이용자의 정보 접근성과 구매 의사결정에 미치는 영향을 커뮤니케이션학적 관점에서 조명하였다.

키워드

온라인 패션 플랫폼

무신사

리뷰 유용성

머신러닝

토픽모델링

XGBoost

SHAP

이용자 생성 콘텐츠(UGC)

References

권준현·이수기, 「주거환경에 대한 거주민의 만족도와 영향요인 분석 - 직방 아파트 리뷰 빅데이터와 딥러닝 기반 BERT 모형을 활용하여」, 『지역연구』 제39권(제2호), 한국지역학회, 2023, 47-61쪽.
김민송·김정열, 「패션 온라인 쇼핑몰의 AI 추천 서비스 만족도 연구-MZ 세대를 중심으로」, 『한국디자인문화학회지』 제27권(제3호), 한국디자인문화학회, 2021.
김태훈·안현철, 「A hybrid under-sampling approach for better bankruptcy prediction」, 『지능정보연구』 제21권(제2호), 한국지능정보시스템학회, 2015. 10.13088/jiis.2015.21.2.173
김형준·황용석, 「온라인 리뷰의 토픽유형이 소비자가 지각하는 리뷰 유용성에 미치는 영향 연구」, 『언론정보연구』 제60권(제4호), 서울대학교 언론정보연구소, 2023.
배성훈·이새롬·백현미, 「세가지 차원의 리뷰어 경험이 리뷰 유용성에 미치는 영향: 온라인 게임 플랫폼 스팀을 중심으로」, 『경영학연구』 제53권(제3호), 한국경영학회, 2024. 10.17287/kmr.2024.53.3.519
안수남·유지호·최승진, 「매니폴드를 고려한 확률적 행렬 3-요소분해」, 『한국정보과학회 학술발표논문집』 제36권(제2C호), 한국정보과학회, 2009.
윤상훈·김근형, 「Word2Vec를 이용한 토픽모델링의 확장 및 분석사례」, 『정보시스템연구』 제30권(제1호), 한국정보시스템학회, 2021.

Chen, T., Guestrin, C., “XGBoost: A scalable tree boosting system”, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2016. 10.1145/2939672.2939785
Devlin, J., Chang, M. W., Lee, K., Toutanova, K., “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, Proceedings of NAACL-HLT 2019 1(1), Association for Computational Linguistics, 2019.
Lee, D., Seung, H. S., “Algorithms for non-negative matrix factorization”, Advances in Neural Information Processing Systems 13, MIT Press, 2000.
Lundberg, S. M., Lee, S. I., “A unified approach to interpreting model predictions”, Advances in Neural Information Processing Systems 30, Curran Associates, Inc., 2017.
Petty, R. E., Cacioppo, J. T., “The elaboration likelihood model of persuasion”, Advances in Experimental Social Psychology 19, Academic Press, 1986. 10.1016/S0065-2601(08)60214-2
Stevens, K., Kegelmeyer, P., Andrzejewski, D., Buttler, D., “Exploring topic coherence over many models and many topics”, Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, ACL, 2012.

Akerlof, G. A., “The Market for ‘Lemons’: Quality Uncertainty and the Market Mechanism”, The Quarterly Journal of Economics 84(3), Oxford University Press, 1970. 10.2307/1879431
Allgaier, J., Mulansky, L., Draelos, R. L., Pryss, R., “How does the model make predictions? A systematic literature review on the explainability power of machine learning in healthcare”, Artificial Intelligence in Medicine 143, Elsevier, 2023. 10.1016/j.artmed.2023.102616
Chai, T., Draxler, R. R., “Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature”, Geoscientific Model Development 7(3), 2014. 10.5194/gmd-7-1247-2014
Choi, H. S., Leon, S., “An empirical investigation of online review helpfulness: A big data perspective”, Decision Support Systems 139, Elsevier, 2020. 10.1016/j.dss.2020.113403
Darby, M. R., Karni, E., “Free competition and the optimal amount of fraud”, Journal of Law and Economics 16(1), The University of Chicago Press, 1973. 10.1086/466756
Fu, D., Hong, Y., Wang, K., Fan, W., “Effects of membership tier on user content generation behaviors: Evidence from online reviews”, Electronic Commerce Research 18, Springer, 2018. 10.1007/s10660-017-9266-7
Gilly, M. C., Wolfinbarger, M., “A comparison of consumer experiences with online and offline shopping”, Consumption Markets and Culture 4(2), Taylor & Francis, 2000. 10.1080/10253866.2000.9670355
Gutt, D., Neumann, J., Zimmermann, S., Kundisch, D., Chen, J., “Design of review systems: A strategic instrument to shape online reviewing behavior and economic outcomes”, The Journal of Strategic Information Systems 28(2), Elsevier, 2019. 10.1016/j.jsis.2019.01.004
Hong, H., Xu, D., Wang, G. A., Fan, W., “Understanding the determinants of online review helpfulness: A meta-analytic investigation”, Decision Support Systems 102, Elsevier, 2017. 10.1016/j.dss.2017.06.007
Huang, A. H., Chen, K., Yen, D. C., Tran, T. P., “A study of factors that contribute to online review helpfulness”, Computers in Human Behavior 48, Elsevier, 2015. 10.1016/j.chb.2015.01.010
Jia, H., Shin, S., Jiao, J., “Does the length of a review matter in perceived helpfulness? The moderating role of product experience”, Journal of Research in Interactive Marketing 16(2), Emerald, 2022. 10.1108/JRIM-04-2020-0086
Karimi, S., Wang, F., “Online review helpfulness: Impact of reviewer profile image”, Decision Support Systems 96, Elsevier, 2017. 10.1016/j.dss.2017.02.001
Lee, S., Choeh, J. Y., “The determinants of helpfulness of online reviews”, Behaviour & Information Technology 35(10), Taylor & Francis, 2016. 10.1080/0144929X.2016.1173099
Luca, M., Zervas, G., “Fake it till you make it: Reputation, competition, and Yelp review fraud”, Management Science 62(12), INFORMS, 2016. 10.1287/mnsc.2015.2304
Luo, Y., Xu, X., “Predicting the helpfulness of online restaurant reviews using different machine learning algorithms”, Sustainability 11(19), MDPI, 2019. 10.3390/su11195254
Mahdikhani, M., “Exploring commonly used terms from online reviews in the fashion field to predict review helpfulness”, International Journal of Information Management Data Insights 3(1), Elsevier, 2023. 10.1016/j.jjimei.2023.100172
Marz, A., XGBoostLSS: An extension of XGBoost to probabilistic forecasting, arXiv preprint, https://arxiv.org/abs/1907.03178.
Mavlanova, T., Benbunan-Fich, R., Koufaris, M., “Signaling theory and information asymmetry in online commerce”, Information & Management 49(5), Elsevier, 2012. 10.1016/j.im.2012.05.004
Mikolov, T., Le, Q. V., Sutskever, I., Exploiting similarities among languages for machine translation, arXiv preprint, https://arxiv.org/abs/1309.4168.
Mudambi, S. M., Schuff, D., “What makes a helpful online review? A study of customer reviews on Amazon.com”, MIS Quarterly 34(1), Management Information Systems Research Center, 2010. 10.2307/20721420
Nelson, P., “Information and consumer behavior”, Journal of Political Economy 78(2), The University of Chicago Press, 1970. 10.1086/259630
Rajput, D., Wang, W. J., Chen, C. C., “Evaluation of a decided sample size in machine learning applications”, BMC Bioinformatics 24(1), BMC, 2023. 10.1186/s12859-023-05156-9 36788550 PMC9926644
Saumya, S., Singh, J. P., Dwivedi, Y. K., “Predicting the helpfulness score of online reviews using convolutional neural network”, Soft Computing 24(15), Springer, 2020. 10.1007/s00500-019-03851-5
Singh, J. P., Irani, S., Rana, N. P., Dwivedi, Y. K., Saumya, S., Roy, P. K., “Predicting the helpfulness of online consumer reviews”, Journal of Business Research 70, Elsevier, 2017. 10.1016/j.jbusres.2016.08.008
Spence, M., “Job market signaling”, Quarterly Journal of Economics 87(3), Oxford University Press, 1973. 10.2307/1882010
Tarawneh, A. S., Hassanat, A. B., Altarawneh, G. A., Almuhaimeed, A., “Stop oversampling for class imbalance learning: A review”, IEEE Access 10, IEEE, 2022. 10.21203/rs.3.rs-1336037/v1
Wan, Y., “The Matthew effect in social commerce: The case of online review helpfulness”, Electronic Markets 25(4), Springer, 2015. 10.1007/s12525-015-0186-x
Willmott, C. J., Matsuura, K., “Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance”, Climate Research 30(1), Inter-Research Science Center, 2005. 10.3354/cr030079
Zhu, L., Yin, G., He, W., “Is this opinion leader’s review useful? Peripheral cues for online review helpfulness”, Journal of Electronic Commerce Research 15(4), California State University, 2014.

et5-typos-corrector, https://huggingface.co/j5ng/et5-typos-corrector
Kiwi: Korean intelligent word identifier, https://github.com/bab2min/kiwi

Information

Publisher :Research Institute of Creative Contents
Publisher(Ko) :글로컬문화전략연구소
Journal Title :The Journal of Culture Contents
Journal Title(Ko) :문화콘텐츠연구
Volume : 36
Pages :315~348
DOI :https://doi.org/10.34227/tjocc.2026..36.315

[1] 권준현·이수기, 「주거환경에 대한 거주민의 만족도와 영향요인 분석 - 직방 아파트 리뷰 빅데이터와 딥러닝 기반 BERT 모형을 활용하여」, 『지역연구』 제39권(제2호), 한국지역학회, 2023, 47-61쪽.

[2] 김민송·김정열, 「패션 온라인 쇼핑몰의 AI 추천 서비스 만족도 연구-MZ 세대를 중심으로」, 『한국디자인문화학회지』 제27권(제3호), 한국디자인문화학회, 2021.

[3] 김태훈·안현철, 「A hybrid under-sampling approach for better bankruptcy prediction」, 『지능정보연구』 제21권(제2호), 한국지능정보시스템학회, 2015. 10.13088/jiis.2015.21.2.173

[4] 김형준·황용석, 「온라인 리뷰의 토픽유형이 소비자가 지각하는 리뷰 유용성에 미치는 영향 연구」, 『언론정보연구』 제60권(제4호), 서울대학교 언론정보연구소, 2023.

[5] 배성훈·이새롬·백현미, 「세가지 차원의 리뷰어 경험이 리뷰 유용성에 미치는 영향: 온라인 게임 플랫폼 스팀을 중심으로」, 『경영학연구』 제53권(제3호), 한국경영학회, 2024. 10.17287/kmr.2024.53.3.519

[6] 안수남·유지호·최승진, 「매니폴드를 고려한 확률적 행렬 3-요소분해」, 『한국정보과학회 학술발표논문집』 제36권(제2C호), 한국정보과학회, 2009.

[7] 윤상훈·김근형, 「Word2Vec를 이용한 토픽모델링의 확장 및 분석사례」, 『정보시스템연구』 제30권(제1호), 한국정보시스템학회, 2021.

[8] Chen, T., Guestrin, C., “XGBoost: A scalable tree boosting system”, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2016. 10.1145/2939672.2939785

[9] Devlin, J., Chang, M. W., Lee, K., Toutanova, K., “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, Proceedings of NAACL-HLT 2019 1(1), Association for Computational Linguistics, 2019.

[10] Lee, D., Seung, H. S., “Algorithms for non-negative matrix factorization”, Advances in Neural Information Processing Systems 13, MIT Press, 2000.

[11] Lundberg, S. M., Lee, S. I., “A unified approach to interpreting model predictions”, Advances in Neural Information Processing Systems 30, Curran Associates, Inc., 2017.

[12] Petty, R. E., Cacioppo, J. T., “The elaboration likelihood model of persuasion”, Advances in Experimental Social Psychology 19, Academic Press, 1986. 10.1016/S0065-2601(08)60214-2

[13] Stevens, K., Kegelmeyer, P., Andrzejewski, D., Buttler, D., “Exploring topic coherence over many models and many topics”, Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, ACL, 2012.

[14] Akerlof, G. A., “The Market for ‘Lemons’: Quality Uncertainty and the Market Mechanism”, The Quarterly Journal of Economics 84(3), Oxford University Press, 1970. 10.2307/1879431

[15] Allgaier, J., Mulansky, L., Draelos, R. L., Pryss, R., “How does the model make predictions? A systematic literature review on the explainability power of machine learning in healthcare”, Artificial Intelligence in Medicine 143, Elsevier, 2023. 10.1016/j.artmed.2023.102616

[16] Chai, T., Draxler, R. R., “Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature”, Geoscientific Model Development 7(3), 2014. 10.5194/gmd-7-1247-2014

[17] Choi, H. S., Leon, S., “An empirical investigation of online review helpfulness: A big data perspective”, Decision Support Systems 139, Elsevier, 2020. 10.1016/j.dss.2020.113403

[18] Darby, M. R., Karni, E., “Free competition and the optimal amount of fraud”, Journal of Law and Economics 16(1), The University of Chicago Press, 1973. 10.1086/466756

[19] Fu, D., Hong, Y., Wang, K., Fan, W., “Effects of membership tier on user content generation behaviors: Evidence from online reviews”, Electronic Commerce Research 18, Springer, 2018. 10.1007/s10660-017-9266-7

[20] Gilly, M. C., Wolfinbarger, M., “A comparison of consumer experiences with online and offline shopping”, Consumption Markets and Culture 4(2), Taylor & Francis, 2000. 10.1080/10253866.2000.9670355

[21] Gutt, D., Neumann, J., Zimmermann, S., Kundisch, D., Chen, J., “Design of review systems: A strategic instrument to shape online reviewing behavior and economic outcomes”, The Journal of Strategic Information Systems 28(2), Elsevier, 2019. 10.1016/j.jsis.2019.01.004

[22] Hong, H., Xu, D., Wang, G. A., Fan, W., “Understanding the determinants of online review helpfulness: A meta-analytic investigation”, Decision Support Systems 102, Elsevier, 2017. 10.1016/j.dss.2017.06.007

[23] Huang, A. H., Chen, K., Yen, D. C., Tran, T. P., “A study of factors that contribute to online review helpfulness”, Computers in Human Behavior 48, Elsevier, 2015. 10.1016/j.chb.2015.01.010

[24] Jia, H., Shin, S., Jiao, J., “Does the length of a review matter in perceived helpfulness? The moderating role of product experience”, Journal of Research in Interactive Marketing 16(2), Emerald, 2022. 10.1108/JRIM-04-2020-0086

[25] Karimi, S., Wang, F., “Online review helpfulness: Impact of reviewer profile image”, Decision Support Systems 96, Elsevier, 2017. 10.1016/j.dss.2017.02.001

[26] Lee, S., Choeh, J. Y., “The determinants of helpfulness of online reviews”, Behaviour & Information Technology 35(10), Taylor & Francis, 2016. 10.1080/0144929X.2016.1173099

[27] Luca, M., Zervas, G., “Fake it till you make it: Reputation, competition, and Yelp review fraud”, Management Science 62(12), INFORMS, 2016. 10.1287/mnsc.2015.2304

[28] Luo, Y., Xu, X., “Predicting the helpfulness of online restaurant reviews using different machine learning algorithms”, Sustainability 11(19), MDPI, 2019. 10.3390/su11195254

[29] Mahdikhani, M., “Exploring commonly used terms from online reviews in the fashion field to predict review helpfulness”, International Journal of Information Management Data Insights 3(1), Elsevier, 2023. 10.1016/j.jjimei.2023.100172

[30] Marz, A., XGBoostLSS: An extension of XGBoost to probabilistic forecasting, arXiv preprint, https://arxiv.org/abs/1907.03178.

[31] Mavlanova, T., Benbunan-Fich, R., Koufaris, M., “Signaling theory and information asymmetry in online commerce”, Information & Management 49(5), Elsevier, 2012. 10.1016/j.im.2012.05.004

[32] Mikolov, T., Le, Q. V., Sutskever, I., Exploiting similarities among languages for machine translation, arXiv preprint, https://arxiv.org/abs/1309.4168.

[33] Mudambi, S. M., Schuff, D., “What makes a helpful online review? A study of customer reviews on Amazon.com”, MIS Quarterly 34(1), Management Information Systems Research Center, 2010. 10.2307/20721420

[34] Nelson, P., “Information and consumer behavior”, Journal of Political Economy 78(2), The University of Chicago Press, 1970. 10.1086/259630

[35] Rajput, D., Wang, W. J., Chen, C. C., “Evaluation of a decided sample size in machine learning applications”, BMC Bioinformatics 24(1), BMC, 2023. 10.1186/s12859-023-05156-9 36788550 PMC9926644

[36] Saumya, S., Singh, J. P., Dwivedi, Y. K., “Predicting the helpfulness score of online reviews using convolutional neural network”, Soft Computing 24(15), Springer, 2020. 10.1007/s00500-019-03851-5

[37] Singh, J. P., Irani, S., Rana, N. P., Dwivedi, Y. K., Saumya, S., Roy, P. K., “Predicting the helpfulness of online consumer reviews”, Journal of Business Research 70, Elsevier, 2017. 10.1016/j.jbusres.2016.08.008

[38] Spence, M., “Job market signaling”, Quarterly Journal of Economics 87(3), Oxford University Press, 1973. 10.2307/1882010

[39] Tarawneh, A. S., Hassanat, A. B., Altarawneh, G. A., Almuhaimeed, A., “Stop oversampling for class imbalance learning: A review”, IEEE Access 10, IEEE, 2022. 10.21203/rs.3.rs-1336037/v1

[40] Wan, Y., “The Matthew effect in social commerce: The case of online review helpfulness”, Electronic Markets 25(4), Springer, 2015. 10.1007/s12525-015-0186-x

[41] Willmott, C. J., Matsuura, K., “Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance”, Climate Research 30(1), Inter-Research Science Center, 2005. 10.3354/cr030079

[42] Zhu, L., Yin, G., He, W., “Is this opinion leader’s review useful? Peripheral cues for online review helpfulness”, Journal of Electronic Commerce Research 15(4), California State University, 2014.

[43] et5-typos-corrector, https://huggingface.co/j5ng/et5-typos-corrector

[44] Kiwi: Korean intelligent word identifier, https://github.com/bab2min/kiwi

The Journal of Culture Contents ISSN:2287-2256(Print) 2671-7026(Online) 문화콘텐츠연구

All Issue