Xiaofan Zhou and Jakramate Bootkrajang

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 168-184

PDF

Abstract

This study presents a robust framework for automated extraction and performance evaluation of video interaction metrics across major Chinese social media platforms (Bilibili, Douyin, Xiaohongshu) characterized by heterogeneous interface designs. Leveraging a synergistic combination of YOLOv8 object detection and Optical Character Recognition (OCR), the proposed system addresses platform-specific challenges in identifying engagement indicators (likes, comments, shares, views etc.) through icon localization and numerical extraction. A dataset of 250 annotated screenshots encompassing diverse interface variations was utilized to train and validate the deep learning model, achieving mean average precision (mAP@50) of 99.5% across all interaction categories. The extracted metrics were standardized and validated against third-party Key Performance Indicators (KPIs) from commercial analytics platforms (Pugongying, Huahuo and Xingtu), demonstrating 98% alignment in performance classification. Hyperparameter optimization and spatial pyramid pooling enhancements enabled cross-platform generalization, with error analysis revealing OCR misinterpretations (e.g., unit omission in "万" (10k) as the primary accuracy limitation. The framework advances social media analytics by enabling scalable, platform-agnostic performance benchmarking, offering practical value for content optimization, advertising compliance verification, and engagement trend analysis in the evolving short video ecosystem.