Data Science and Engineering (DSE) Record Vol. 7 2026

Biannual Journal

Volume: Vol. 7 2026

Issue: No. 1 March

Evaluation Constrained and Unconstrained Machine Learning Strategies for Credit Risk Optimization in Financial Institutions

Putanyn Manee and Jakramate Bootkrajang

Published in Data Science and Engineering (DSE) Record 2026 Vol. 7 No. 1 pp. 1-24

PDF

This research investigates the critical strategic trade-offs between revenue maximization and loss control in credit risk modeling by comparing three distinct optimization strategies. The study addresses the gap between technical model performance and actual business outcomes, which is often overlooked in traditional machine learning applications. Using a synthetic dataset of 135,000 personal loans with 29 features, the study evaluates five machine learning models across multiple probability thresholds. The findings reveal that behavioral features capturing post-origination payment patterns drive 71% of predictive improvement, compared to only 29% from hyperparameter tuning. Strategy A (Income Maximization) achieved the highest profit of $662.54M with a 98.7% approval rate using Gradient Boosting at a 20% threshold. In contrast, Strategy B (Pure Loss Minimization) produced an impractical 0.04% approval rate, proving that unconstrained loss reduction leads to operational failure. Strategy C (Constrained Loss Minimization) implemented a 60% minimum approval constraint based on Federal Reserve standards, achieving $416.54M in profit with 40% lower losses than Strategy A. Critically, the choice of probability threshold demonstrated a five times greater financial impact than the choice of algorithm. These results provide strong empirical evidence that integrating specific business constraints is essential for effective and sustainable credit risk.

Analyzing Influence Factors of Jewelry Products Purchasing in Online Marketplaces Using Natural Language Processing

Kanyarat Srisuttha and Arinya Pongwat

Published in Data Science and Engineering (DSE) Record 2026 Vol. 7 No. 1 pp. 25-36

PDF

This independent study aims to analyze the factors influencing online jewelry purchasing decisions based on the 7Ps marketing mix framework, utilizing Natural Language Processing (NLP) and Machine Learning techniques. The dataset comprises 5,766 user reviews scraped from Shopee and Lazada. The research followed the CRISP-DM standard, employing TF-IDF for vectorization and Proportional SMOTE for data balancing to preserve the original significance of the factors. Comparative performance results revealed that the XGBoost algorithm achieved the highest accuracy at 75.39% and an F1-score of 75.30%. Meanwhile, the WangchanBERTa model, fine-tuned for 20 epochs, reached an accuracy of 74.09%, hindered by data volume constraints and Out-of-Vocabulary (OOV) issues. However, the Random Forest Classifier yielded the highest ROC AUC at 93.73%, demonstrating superior class differentiation capabilities. The findings indicate that the most discussed factors are Product (33.37%) and Process (24.19%), with "aesthetic design" and "shipping speed" identified as critical drivers of maximum customer satisfaction. These insights assist entrepreneurs in strategic marketing planning, inventory management, and packaging development to sustainably enhance competitiveness in the online marketplace.

Analysis of Travel Application Review Data with Natural Language Processing and Visualization

Rattakarn janya and Arinya Pongwat

Published in Data Science and Engineering (DSE) Record 2026 Vol. 7 No. 1 pp. 37-52

PDF

This independent study aims to analyze user review data and identify key factors influencing user experience (UX) in online travel applications. The study conducts a comparative analysis of three major platforms: Traveloka, Agoda, and Booking.com, using data scraped from the Google Play Store. By leveraging Natural Language Processing (NLP) techniques, the researcher highlights the significance of understanding user needs in the post-pandemic era to enhance digital service efficiency and development. The conceptual framework for categorizing user feedback is based on user experience theories, divided into four primary dimensions: 1) Information Service & Quality, 2) Perceived Benefits, 3) App Performance, and 4) App Design. A dataset comprising 4,035 textual records was collected and subjected to feature extraction for analysis. Experimental results indicate that Logistic Regression outperformed the other evaluated models, including SVM, Neural Network, Naïve Bayes, Random Forest, and Zero-Shot Learning (ZSL), achieving a classification accuracy of 79.14%. Regarding the thematic analysis, "Information Service & Quality" emerged as the most prominent dimension (26.75%), followed by "Perceived Benefits" (25.46%). Furthermore, in-depth visual analytics using Word Clouds and Co-occurrence Networks revealed that negative reviews were significantly associated with keywords such as "Customer," "Service," and "Refund." These findings suggest that service quality and refund processes are pivotal factors in user decision-making. Consequently, this research serves as a strategic guideline for developers to refine functionalities and better meet the evolving demands of contemporary users.

Auto Battle Game Outcome Prediction with Machine Learning

Sittakon Phommee and Juggapong Natwichai

Published in Data Science and Engineering (DSE) Record 2026 Vol. 7 No. 1 pp. 53-62

PDF

This research aims to compare the performance of machine learning algorithms for classifying match outcomes in Teamfight Tactics (Set 13) using a dataset of 78,412 samples collected from the Riot API. Four additional engineered features are implemented and compared three encoding techniques Label Encoding, One-Hot Encoding, and Bag-of-Words, in combination with four classification algorithms: k-Nearest Neighbors (k-NN), Support Vector Machine (SVM), Random Forest, and XGBClassifier. The experimental results indicate that the Bag-of-Words technique achieved the highest performance across all algorithms and effectively reduced the impact of data sequence variance. Among the algorithms, XGBClassifier delivered the most accurate predictions, with an Accuracy of 85.25% and an F1-Score of 0.85. Furthermore, feature importance analysis revealed that the newly engineered feature, Total Cost, is the most significant factor influencing match outcomes.

Developing Prediction-Based Portfolio Optimization Framework Using LSTM-Autoencoder and Worst-Case Omega Model

Kittanai Yamkleeb, Sumalee Sangamuang, and Prompong Sungunnasil

Published in Data Science and Engineering (DSE) Record 2026 Vol. 7 No. 1 pp. 63-81

PDF

Translating financial return predictions into effective portfolio decisions remains challenging due to the gap between predictive accuracy and investment performance. This study presents a prediction-to-decision framework that integrates deep learning-based return forecasting with Omega-based portfolio optimization. Using daily OHLCV data from 2018-2024, an autoencoder-long short-term memory (AE+LSTM) model is used to generate return forecasts, which are incorporated into a worst-case Omega allocation scheme to account for asymmetric return preferences. Forecasting performance is evaluated against autoregressive and neural baselines using both numerical error metrics (MAE, MSE) and directional measures (Hit Rate), while portfolio performance is assessed under consistent rebalancing rules with transaction costs and compared with equally weighted and mean-variance benchmarks. Out-of-sample backtesting across different market regimes examines annualized return, volatility, sharpe ratio, and maximum drawdown. The results suggest that differences in directional prediction behavior are associated with variations in portfolio-level outcomes under Omega-based allocation. In particular, models with more consistent directional patterns tend to provide a more balanced trade-off between return, risk, and turnover. Overall, the framework offers a systematic approach for examining how predictive signals translate into portfolio decisions across varying market conditions.

All volumes

Data Science and Engineering (DSE) Record Vol. 7 2026

Biannual Journal

Volume: Vol. 7 2026

Issue: No. 1 March

All volumes

Vol. 7 2026

Vol. 6 2025

Vol. 5 2024

Vol. 4 2023

Vol. 3 2022

Vol. 2 2021

Vol. 1 2020