Current issue: Vol. 7 2026


Putanyn Manee and Jakramate Bootkrajang

Published in Data Science and Engineering (DSE) Record 2026 Vol. 7 No. 1 pp. 1-24

PDF
This research investigates the critical strategic trade-offs between revenue maximization and loss control in credit risk modeling by comparing three distinct optimization strategies. The study addresses the gap between technical model performance and actual business outcomes, which is often overlooked in traditional machine learning applications. Using a synthetic dataset of 135,000 personal loans with 29 features, the study evaluates five machine learning models across multiple probability thresholds. The findings reveal that behavioral features capturing post-origination payment patterns drive 71% of predictive improvement, compared to only 29% from hyperparameter tuning. Strategy A (Income Maximization) achieved the highest profit of $662.54M with a 98.7% approval rate using Gradient Boosting at a 20% threshold. In contrast, Strategy B (Pure Loss Minimization) produced an impractical 0.04% approval rate, proving that unconstrained loss reduction leads to operational failure. Strategy C (Constrained Loss Minimization) implemented a 60% minimum approval constraint based on Federal Reserve standards, achieving $416.54M in profit with 40% lower losses than Strategy A. Critically, the choice of probability threshold demonstrated a five times greater financial impact than the choice of algorithm. These results provide strong empirical evidence that integrating specific business constraints is essential for effective and sustainable credit risk.

Kanyarat Srisuttha and Arinya Pongwat

Published in Data Science and Engineering (DSE) Record 2026 Vol. 7 No. 1 pp. 25-36

PDF
This independent study aims to analyze the factors influencing online jewelry purchasing decisions based on the 7Ps marketing mix framework, utilizing Natural Language Processing (NLP) and Machine Learning techniques. The dataset comprises 5,766 user reviews scraped from Shopee and Lazada. The research followed the CRISP-DM standard, employing TF-IDF for vectorization and Proportional SMOTE for data balancing to preserve the original significance of the factors. Comparative performance results revealed that the XGBoost algorithm achieved the highest accuracy at 75.39% and an F1-score of 75.30%. Meanwhile, the WangchanBERTa model, fine-tuned for 20 epochs, reached an accuracy of 74.09%, hindered by data volume constraints and Out-of-Vocabulary (OOV) issues. However, the Random Forest Classifier yielded the highest ROC AUC at 93.73%, demonstrating superior class differentiation capabilities. The findings indicate that the most discussed factors are Product (33.37%) and Process (24.19%), with "aesthetic design" and "shipping speed" identified as critical drivers of maximum customer satisfaction. These insights assist entrepreneurs in strategic marketing planning, inventory management, and packaging development to sustainably enhance competitiveness in the online marketplace.

Rattakarn janya and Arinya Pongwat

Published in Data Science and Engineering (DSE) Record 2026 Vol. 7 No. 1 pp. 37-52

PDF
This independent study aims to analyze user review data and identify key factors influencing user experience (UX) in online travel applications. The study conducts a comparative analysis of three major platforms: Traveloka, Agoda, and Booking.com, using data scraped from the Google Play Store. By leveraging Natural Language Processing (NLP) techniques, the researcher highlights the significance of understanding user needs in the post-pandemic era to enhance digital service efficiency and development. The conceptual framework for categorizing user feedback is based on user experience theories, divided into four primary dimensions: 1) Information Service & Quality, 2) Perceived Benefits, 3) App Performance, and 4) App Design. A dataset comprising 4,035 textual records was collected and subjected to feature extraction for analysis. Experimental results indicate that Logistic Regression outperformed the other evaluated models, including SVM, Neural Network, Naïve Bayes, Random Forest, and Zero-Shot Learning (ZSL), achieving a classification accuracy of 79.14%. Regarding the thematic analysis, "Information Service & Quality" emerged as the most prominent dimension (26.75%), followed by "Perceived Benefits" (25.46%). Furthermore, in-depth visual analytics using Word Clouds and Co-occurrence Networks revealed that negative reviews were significantly associated with keywords such as "Customer," "Service," and "Refund." These findings suggest that service quality and refund processes are pivotal factors in user decision-making. Consequently, this research serves as a strategic guideline for developers to refine functionalities and better meet the evolving demands of contemporary users.

Sittakon Phommee and Juggapong Natwichai

Published in Data Science and Engineering (DSE) Record 2026 Vol. 7 No. 1 pp. 53-62

PDF
This research aims to compare the performance of machine learning algorithms for classifying match outcomes in Teamfight Tactics (Set 13) using a dataset of 78,412 samples collected from the Riot API. Four additional engineered features are implemented and compared three encoding techniques Label Encoding, One-Hot Encoding, and Bag-of-Words, in combination with four classification algorithms: k-Nearest Neighbors (k-NN), Support Vector Machine (SVM), Random Forest, and XGBClassifier. The experimental results indicate that the Bag-of-Words technique achieved the highest performance across all algorithms and effectively reduced the impact of data sequence variance. Among the algorithms, XGBClassifier delivered the most accurate predictions, with an Accuracy of 85.25% and an F1-Score of 0.85. Furthermore, feature importance analysis revealed that the newly engineered feature, Total Cost, is the most significant factor influencing match outcomes.