Volume: Vol. 7 2026


Issue: No. 1 March

Putanyn Manee and Jakramate Bootkrajang

Published in Data Science and Engineering (DSE) Record 2026 Vol. 7 No. 1 pp. 1-24

PDF
This research investigates the critical strategic trade-offs between revenue maximization and loss control in credit risk modeling by comparing three distinct optimization strategies. The study addresses the gap between technical model performance and actual business outcomes, which is often overlooked in traditional machine learning applications. Using a synthetic dataset of 135,000 personal loans with 29 features, the study evaluates five machine learning models across multiple probability thresholds. The findings reveal that behavioral features capturing post-origination payment patterns drive 71% of predictive improvement, compared to only 29% from hyperparameter tuning. Strategy A (Income Maximization) achieved the highest profit of $662.54M with a 98.7% approval rate using Gradient Boosting at a 20% threshold. In contrast, Strategy B (Pure Loss Minimization) produced an impractical 0.04% approval rate, proving that unconstrained loss reduction leads to operational failure. Strategy C (Constrained Loss Minimization) implemented a 60% minimum approval constraint based on Federal Reserve standards, achieving $416.54M in profit with 40% lower losses than Strategy A. Critically, the choice of probability threshold demonstrated a five times greater financial impact than the choice of algorithm. These results provide strong empirical evidence that integrating specific business constraints is essential for effective and sustainable credit risk.

Kanyarat Srisuttha and Arinya Pongwat

Published in Data Science and Engineering (DSE) Record 2026 Vol. 7 No. 1 pp. 25-36

PDF
This independent study aims to analyze the factors influencing online jewelry purchasing decisions based on the 7Ps marketing mix framework, utilizing Natural Language Processing (NLP) and Machine Learning techniques. The dataset comprises 5,766 user reviews scraped from Shopee and Lazada. The research followed the CRISP-DM standard, employing TF-IDF for vectorization and Proportional SMOTE for data balancing to preserve the original significance of the factors. Comparative performance results revealed that the XGBoost algorithm achieved the highest accuracy at 75.39% and an F1-score of 75.30%. Meanwhile, the WangchanBERTa model, fine-tuned for 20 epochs, reached an accuracy of 74.09%, hindered by data volume constraints and Out-of-Vocabulary (OOV) issues. However, the Random Forest Classifier yielded the highest ROC AUC at 93.73%, demonstrating superior class differentiation capabilities. The findings indicate that the most discussed factors are Product (33.37%) and Process (24.19%), with "aesthetic design" and "shipping speed" identified as critical drivers of maximum customer satisfaction. These insights assist entrepreneurs in strategic marketing planning, inventory management, and packaging development to sustainably enhance competitiveness in the online marketplace.