Data Science and Engineering (DSE) Record 2026 Vol. 7 No. 1 March

Biannual Journal

Issue: March 2026 Vol. 7 No. 1

Evaluation Constrained and Unconstrained Machine Learning Strategies for Credit Risk Optimization in Financial Institutions

Putanyn Manee and Jakramate Bootkrajang

Published in Data Science and Engineering (DSE) Record 2026 Vol. 7 No. 1 pp. 1-24

PDF

This research investigates the critical strategic trade-offs between revenue maximization and loss control in credit risk modeling by comparing three distinct optimization strategies. The study addresses the gap between technical model performance and actual business outcomes, which is often overlooked in traditional machine learning applications. Using a synthetic dataset of 135,000 personal loans with 29 features, the study evaluates five machine learning models across multiple probability thresholds. The findings reveal that behavioral features capturing post-origination payment patterns drive 71% of predictive improvement, compared to only 29% from hyperparameter tuning. Strategy A (Income Maximization) achieved the highest profit of $662.54M with a 98.7% approval rate using Gradient Boosting at a 20% threshold. In contrast, Strategy B (Pure Loss Minimization) produced an impractical 0.04% approval rate, proving that unconstrained loss reduction leads to operational failure. Strategy C (Constrained Loss Minimization) implemented a 60% minimum approval constraint based on Federal Reserve standards, achieving $416.54M in profit with 40% lower losses than Strategy A. Critically, the choice of probability threshold demonstrated a five times greater financial impact than the choice of algorithm. These results provide strong empirical evidence that integrating specific business constraints is essential for effective and sustainable credit risk.

Analyzing Influence Factors of Jewelry Products Purchasing in Online Marketplaces Using Natural Language Processing

Kanyarat Srisuttha and Arinya Pongwat

Published in Data Science and Engineering (DSE) Record 2026 Vol. 7 No. 1 pp. 25-36

PDF

This independent study aims to analyze the factors influencing online jewelry purchasing decisions based on the 7Ps marketing mix framework, utilizing Natural Language Processing (NLP) and Machine Learning techniques. The dataset comprises 5,766 user reviews scraped from Shopee and Lazada. The research followed the CRISP-DM standard, employing TF-IDF for vectorization and Proportional SMOTE for data balancing to preserve the original significance of the factors. Comparative performance results revealed that the XGBoost algorithm achieved the highest accuracy at 75.39% and an F1-score of 75.30%. Meanwhile, the WangchanBERTa model, fine-tuned for 20 epochs, reached an accuracy of 74.09%, hindered by data volume constraints and Out-of-Vocabulary (OOV) issues. However, the Random Forest Classifier yielded the highest ROC AUC at 93.73%, demonstrating superior class differentiation capabilities. The findings indicate that the most discussed factors are Product (33.37%) and Process (24.19%), with "aesthetic design" and "shipping speed" identified as critical drivers of maximum customer satisfaction. These insights assist entrepreneurs in strategic marketing planning, inventory management, and packaging development to sustainably enhance competitiveness in the online marketplace.

Analysis of Travel Application Review Data with Natural Language Processing and Visualization

Rattakarn janya and Arinya Pongwat

Published in Data Science and Engineering (DSE) Record 2026 Vol. 7 No. 1 pp. 37-52

PDF

This independent study aims to analyze user review data and identify key factors influencing user experience (UX) in online travel applications. The study conducts a comparative analysis of three major platforms: Traveloka, Agoda, and Booking.com, using data scraped from the Google Play Store. By leveraging Natural Language Processing (NLP) techniques, the researcher highlights the significance of understanding user needs in the post-pandemic era to enhance digital service efficiency and development. The conceptual framework for categorizing user feedback is based on user experience theories, divided into four primary dimensions: 1) Information Service & Quality, 2) Perceived Benefits, 3) App Performance, and 4) App Design. A dataset comprising 4,035 textual records was collected and subjected to feature extraction for analysis. Experimental results indicate that Logistic Regression outperformed the other evaluated models, including SVM, Neural Network, Naïve Bayes, Random Forest, and Zero-Shot Learning (ZSL), achieving a classification accuracy of 79.14%. Regarding the thematic analysis, "Information Service & Quality" emerged as the most prominent dimension (26.75%), followed by "Perceived Benefits" (25.46%). Furthermore, in-depth visual analytics using Word Clouds and Co-occurrence Networks revealed that negative reviews were significantly associated with keywords such as "Customer," "Service," and "Refund." These findings suggest that service quality and refund processes are pivotal factors in user decision-making. Consequently, this research serves as a strategic guideline for developers to refine functionalities and better meet the evolving demands of contemporary users.

Auto Battle Game Outcome Prediction with Machine Learning

Sittakon Phommee and Juggapong Natwichai

Published in Data Science and Engineering (DSE) Record 2026 Vol. 7 No. 1 pp. 53-62

PDF

This research aims to compare the performance of machine learning algorithms for classifying match outcomes in Teamfight Tactics (Set 13) using a dataset of 78,412 samples collected from the Riot API. Four additional engineered features are implemented and compared three encoding techniques Label Encoding, One-Hot Encoding, and Bag-of-Words, in combination with four classification algorithms: k-Nearest Neighbors (k-NN), Support Vector Machine (SVM), Random Forest, and XGBClassifier. The experimental results indicate that the Bag-of-Words technique achieved the highest performance across all algorithms and effectively reduced the impact of data sequence variance. Among the algorithms, XGBClassifier delivered the most accurate predictions, with an Accuracy of 85.25% and an F1-Score of 0.85. Furthermore, feature importance analysis revealed that the newly engineered feature, Total Cost, is the most significant factor influencing match outcomes.

Developing Prediction-Based Portfolio Optimization Framework Using LSTM-Autoencoder and Worst-Case Omega Model

Kittanai Yamkleeb, Sumalee Sangamuang, and Prompong Sungunnasil

Published in Data Science and Engineering (DSE) Record 2026 Vol. 7 No. 1 pp. 63-81

PDF

Translating financial return predictions into effective portfolio decisions remains challenging due to the gap between predictive accuracy and investment performance. This study presents a prediction-to-decision framework that integrates deep learning-based return forecasting with Omega-based portfolio optimization. Using daily OHLCV data from 2018-2024, an autoencoder-long short-term memory (AE+LSTM) model is used to generate return forecasts, which are incorporated into a worst-case Omega allocation scheme to account for asymmetric return preferences. Forecasting performance is evaluated against autoregressive and neural baselines using both numerical error metrics (MAE, MSE) and directional measures (Hit Rate), while portfolio performance is assessed under consistent rebalancing rules with transaction costs and compared with equally weighted and mean-variance benchmarks. Out-of-sample backtesting across different market regimes examines annualized return, volatility, sharpe ratio, and maximum drawdown. The results suggest that differences in directional prediction behavior are associated with variations in portfolio-level outcomes under Omega-based allocation. In particular, models with more consistent directional patterns tend to provide a more balanced trade-off between return, risk, and turnover. Overall, the framework offers a systematic approach for examining how predictive signals translate into portfolio decisions across varying market conditions.

Regularized Matrix Factorization for Trust-Aware Recommender System

Paramate Phuengtrakul, Jakramate Bootkrajang, and Dussadee Praserttipong

Published in Data Science and Engineering (DSE) Record 2026 Vol. 7 No. 1 pp. 82-98

PDF

Trust-aware recommender systems leverage social trust to mitigate rating sparsity and the cold-start problem, yet most public trust datasets represent trust as sparse binary links, which can underutilize structural information in the trust network. This paper proposes a Katz-based trust enrichment method that transforms binary directed trust into continuous multi-hop trust signals — capturing friend-of-friend propagation via truncated path counting — and integrates these signals into TrustSVD to improve recommendation accuracy while maintaining a practical accuracy–complexity trade-off. The proposed method is instantiated as four variants: Katz-2, Katz-3, and their corresponding boosted variants (Boosted Katz-2 and Boosted Katz-3), which differ in propagation depth and whether direct trust edges are re-emphasized after propagation. To characterize the value of multi-hop propagation, the proposed method is evaluated against three reference representations: the binary baseline used by the original TrustSVD and two local-overlap benchmarks (cosine and Jaccard similarity) that capture only direct neighborhood agreement without path propagation. Using FilmTrust and CiaoDVD, the study evaluates all seven trust representations under a unified training and hyperparameter tuning protocol, with performance reported via 5-fold cross-validation on RMSE and MAE for both all-user and cold-start user groups (rating < 10 and rating < 5). Results show that the proposed Katz-based method yields modest but consistent accuracy improvements over the binary baseline, with the clearest benefits in cold-start settings and in the sparser CiaoDVD dataset. Across settings, Katz-2 emerges as the most reliable variant of the proposed method, whereas the most extreme coldstart group in CiaoDVD (rating < 5) slightly favors the Jaccard benchmark. Given that training cost is dominated by repeated SGD updates and increases with enlarged effective trust neighborhoods, Katz-2 offers a strong default balance between accuracy gains and computational overhead.

Ensemble Collaborative Filtering Technique for Elective Courses Recommender System

Weerinphas Chimnam, Dussadee Praserttitipong, and Pruet Boonma

Published in Data Science and Engineering (DSE) Record 2026 Vol. 7 No. 1 pp. 99-118

PDF

This research investigates grade prediction and recommendation for selectable course options under sparse educational data conditions. In this study, elective courses are interpreted as courses for which students have enrollment choices, including Major Elective, Free Elective, and General Education courses where applicable. To address these challenges, the study proposes an ensemble collaborative filtering technique for an elective courses recommender system. The proposed technique integrates collaborative filtering with feature-based regression models and combines historical academic performance, course metadata, and semantic similarity from course descriptions to improve prediction accuracy and coverage. A time-aware evaluation protocol is applied to simulate realistic academic progression and prevent temporal data leakage. Experimental results show that the proposed ensemble models outperform single-model approaches, especially for near-cold-start users, while also maintaining prediction capability for new-item cases. The findings demonstrate that the proposed technique balances accuracy, robustness, and coverage. The system can serve as an additional tool to help students consider selectable course options rather than as a definitive course-selection mechanism.

All volumes

Data Science and Engineering (DSE) Record 2026 Vol. 7 No. 1 March

Biannual Journal

Issue: March 2026 Vol. 7 No. 1

All volumes

Vol. 7 2026

Vol. 6 2025

Vol. 5 2024

Vol. 4 2023

Vol. 3 2022

Vol. 2 2021

Vol. 1 2020