Data Science and Engineering (DSE) Record

Biannual Journal

The Data Science and Engineering (DSE) Record will be published in Lecture Notes in Computer Science (LNCS) format.
Learn more

Current issue: Vol. 6 2025

COMPARATIVE STUDY OF LLM MODELS FOR SENTIMENT CLASSIFICATION IN THAI FINANCIAL NEWS HEADLINES

Nuttawut Thuayhanruksa and Pree Thiengburanathun

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 1-30

PDF

This paper explores the application of various natural language processing (NLP) models for sentiment analysis on financial news articles sourced from Thai financial news websites, focusing on Thai-language data. The study evaluates machine learning and deep learning models, including Lo-gistic Regression, Bidirectional Long Short-Term Memory (Bi-LSTM), Con-volutional Neural Networks (CNN), WangChanBERTa, OpenAI’s GPT-3.5 and OpenThaiGPT. The models' performance is assessed using accuracy, precision, recall, and F1-score. The findings reveal that the Fine-tuned WangChanBERTa model achieved the highest accuracy of 0.84 on the test-ing set, demonstrating its superior ability in classifying sentiment in Thai financial news. BI-LSTM and CNN models also performed well, with test-ing accuracies of 0.781 and 0.791 In contrast, OpenAI’s GPT-3.5 and Open-ThaiGPT, which lacked fine-tuning and optimized prompts due to computa-tional constraints, exhibited practical limitations in resource-constrained settings.

Tri-Training Based Model Semi-Supervised Aspect -Based Sentiment Analysis: MOOCs Case Study

Kitichart Nukaew and Arinya Pongwat

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 31-55

PDF

Massive Open Online Courses (MOOCs) have seen continuous growth in popularity and rapid expansion. In the instructional design process, receiv-ing feedback from learners is crucial, as it helps tailor the content to better meet learners' needs. The application of NLP models in analyzing learners' feedback is an effective approach for extracting insights from a large volume of comments related to the courses. These models can categorize feedback into three distinct categories: course, instructors, and assessments. Addi-tionally, the models can predict the sentiment of the feedback, determining whether it is positive or negative. In developing these models, semi-supervised learning techniques have been employed to address the chal-lenge of limited data availability. Experimental results indicate that, for feedback categorization, a GRU model combined with tri-training with dis-agreement yields the highest prediction accuracy. Conversely, for sentiment analysis, a GRU model combined with tri-training produces the best out-comes.

Online Review-Based Positioning Analysis Using Natural Language Processing Techniques

Kamonwit Makkaphan, Prompong Sungunnasil, Waranya Mahanan, and Sumalee Sangamuang

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 56-90

PDF

Online customer reviews represent a valuable source of information for businesses seeking to understand consumer perceptions and preferences. This paper introduces a framework for competitive positioning analysis by leveraging these online reviews and sentiment analysis. The framework employs Natural Language Processing (NLP) techniques in three phases: 1) identifying key themes and topics from reviews using Latent Dirichlet Allocation (LDA); 2) extracting product features through zero-shot text classification; and 3) visualizing competitive positioning via Net Promoter Score (NPS) and sentiment analysis plots. A case study on Amazon’s laptop market revealed a moderate correlation (58.8%) between NPS and sentiment analysis, suggesting potential limitations in feature classification accuracy. While the study demonstrates the value of NLP for analyzing online reviews, it also emphasizes the need for improved feature recognition methods and more robust datasets to enhance the precision of competitive positioning analysis.

Analyzing Customer Behavior in Walking Street Markets Using Deep Learning Techniques

Manaschai Aonon and Phasit Charoenkwan

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 91-130

PDF

This research presents a comprehensive framework for analyzing customer behavior in walking street markets using advanced person re-identification techniques. We deployed dual CCTV cameras at strategic points along a 200-meter section of a walking street market in Chiang Mai, Thailand, to track customer movements and analyze behavioral patterns. Our methodol-ogy comprises three main components: (1) a novel segmentation-enhanced multi-region feature extraction framework combining YOLOv11 segmenta-tion with Swin Transformer, (2) a robust person re-identification approach with PCA-enhanced feature matching, and (3) detailed customer behavior analysis based on movement patterns, speeds, and interactions. Our feature extraction method achieves 92.31% Rank-1 accuracy and 59.62% mAP, significantly outperforming traditional approaches. Using the re-identification results, we identify five distinct customer behavior types (Goal-Oriented, Browsing, Lingering, Focused, and Brief Visitors) with ac-tionable insights for market management. This research contributes both methodological advances in per-son re-identification and practical applica-tions for retail analytics in dynamic public spaces.

THE USE OF LARGE LANGUAGE MODELS IN GROUP CHAT PROGRAM FOR COUNSELING BETWEEN DOCTORS AND HEART DISEASE PATIENTS

Noratap Muangudom and Karn Patanukhom

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 131-167

PDF

In recent years, Large Language Models (LLMs) have demonstrated signifi-cant potential in various applications, including healthcare, education, and customer support. This study investigates the integration of LLMs into group chat environments to facilitate medical counseling between doctors and heart disease patients. Traditional chatbot systems primarily operate in one-on-one interactions, which can lead to redundant queries and ineffi-ciencies in medical consultations. This research introduces a novel chatbot system designed for group chat settings, allowing multiple users and medi-cal professionals to interact seamlessly within the same conversation.The chatbot system retrieves medical knowledge from a predefined document database using an information retrieval model to ensure responses are rele-vant and accurate. A verification mechanism is integrated, enabling doctors to review and validate chatbot-generated responses before they are present-ed to patients. The study employs hypothesis testing and real-world evalua-tions to measure chatbot performance across three key dimensions: re-sponse accuracy, response speed, and user satisfaction. Experimental re-sults indicate that group chat environments improve communication effi-ciency, reduce repetitive queries, and enhance patient engagement compared to traditional one-on-one chatbot interactions.Furthermore, user feedback highlights the strengths and limitations of the proposed system. While the chatbot successfully provides relevant medical information, challenges re-main in ensuring response accuracy, reducing response time, and improving contextual understanding in group conversations. Future work will focus on refining chatbot algorithms, enhancing natural language processing capa-bilities, and expanding the medical knowledge base to support a wider range of healthcare scenarios. This research underscores the potential of LLMs in transforming digital healthcare support, making medical consulta-tions more efficient, accessible, and collaborative.

Video Sharing Platform Data Extraction: Transforming Images into Structured Data

Xiaofan Zhou and Jakramate Bootkrajang

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 168-184

PDF

This study presents a robust framework for automated extraction and performance evaluation of video interaction metrics across major Chinese social media platforms (Bilibili, Douyin, Xiaohongshu) characterized by heterogeneous interface designs. Leveraging a synergistic combination of YOLOv8 object detection and Optical Character Recognition (OCR), the proposed system addresses platform-specific challenges in identifying engagement indicators (likes, comments, shares, views etc.) through icon localization and numerical extraction. A dataset of 250 annotated screenshots encompassing diverse interface variations was utilized to train and validate the deep learning model, achieving mean average precision (mAP@50) of 99.5% across all interaction categories. The extracted metrics were standardized and validated against third-party Key Performance Indicators (KPIs) from commercial analytics platforms (Pugongying, Huahuo and Xingtu), demonstrating 98% alignment in performance classification. Hyperparameter optimization and spatial pyramid pooling enhancements enabled cross-platform generalization, with error analysis revealing OCR misinterpretations (e.g., unit omission in "万" (10k) as the primary accuracy limitation. The framework advances social media analytics by enabling scalable, platform-agnostic performance benchmarking, offering practical value for content optimization, advertising compliance verification, and engagement trend analysis in the evolving short video ecosystem.

Sentiment Analysis and Data Visualization for Customer Satisfaction Survey in Healthcare Services

Natchayar Saosuwan and Karn Patanukhom

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 184-193

PDF

The healthcare sector is becoming more competitive, requiring businesses to understand consumer needs through sentiment analysis of feedback. This study analyzed feedback from Sriphat Medical Center to assess satisfaction (satisfied/dissatisfied) across eight as-pects, including service process, staff behavior, and medical expertise. Using Natural Lan-guage Processing (NLP) and machine learning with Bag-of-Words and the Term Frequency-Inverse Document Frequency (TF-IDF) techniques, the best-performing model was a linear SVM with 95.8% accuracy in satisfaction classification and 77.4% in aspect classification.

Hallucination Detection for Large Language Model in Medical Context

Pusit Seephueng and Prompong Sugunnasil

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 194-210

PDF

Hallucination in large language models (LLMs) presents a significant challenge in medical applications, where accuracy and reliability are paramount. This study investigates reasoning hallucinations in LLMs and proposes ensemble methods to mitigate their occurrence. Using the False Confidence Test (FCT) dataset from Med-HALT, we evaluate six individual medical LLMs and introduce two ensemble techniques: Weighted Voting and Cascade Ensemble. Our findings indicate that individual models exhibit varied accuracy, with some prone to generating hallucinations. The ensemble methods significantly improve performance, with Cascade Ensemble achieving the highest accuracy (30.23%) and pointwise score (24.12), effectively reducing hallucination-induced errors. While Weighted Voting provides a balance between efficiency and accuracy, it initially suffers from unreliable model contributions. These results highlight the potential of structured ensemble techniques to enhance the robustness of medical LLMs, offering a viable approach for mitigating reasoning hallucinations in clinical decision support systems.

Estimation of the Credit Rating of the Listed Companies in The Stock Exchange of Thailand Based on Financial Statement

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 211-224

PDF

Nowaday, there are over 921 listed companies on the Stock Exchange of Thailand, with a total market capitalization of 17,430,644.71 billion THB as of the end of 2023. These listed companies can issue bonds (debt securities) for public sale, providing Thai investors with diverse financial investment options. In 2023, more than 4,753,851 billion THB was raised through initial bond offerings. Despite stringent oversight by the Securities and Exchange Commission of Thailand (SEC), some companies have faced financial failures, leading to delisting and defaults on bond payments, which have significantly harmed numerous investors. Most companies that defaulted on bond payments lacked credit ratings from credit rating agencies, which are crucial for investors to assess the risk of financial failure. As of August 2024, only 175 listed companies on the Stock Exchange of Thailand had received credit ratings from Tris Rating Co., Ltd. This highlights the importance of analyzing and estimating credit ratings for listed companies based on their financial statements to support Thai investors in evaluating financial investments. The findings of this research aim to provide a valuable tool for investors in analyzing investments in financial instruments issued by listed companies. The result of study show in a tabular format including machine learning model performance and training parameters.

Development of a Retrieval-Augmented Generation System for Legal Data in Thai Language

Pimchanok Promwang and Pruet Boonma

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 225-243

PDF

This study presents a Retrieval-Augmented Generation (RAG) framework tai-lored for Thai legal question answering. The system integrates sparse re-trieval (BM25), dense retrieval (SentenceTransformer), and a hybrid ap-proach combin-ing both methods with dynamic weighting. To enhance con-textual relevance, a BGE-based re-ranking model was employed. Experiments were conducted on a Thai legal dataset (WangchanX-Legal-ThaiCCL-RAG), and performance was evaluated using Recall@K, Precision@K, MAP, and ROUGE-L. Results showed that while dense retrieval outperformed sparse retrieval in most metrics, the hybrid method—augmented by re-ranking—yielded the highest retrieval accuracy at low K values, with Recall@1 reaching 73.3%. Alt-hough this approach introduced additional processing time, the system re-mained near real-time in response. In the answer generation phase, the mod-el achieved an average ROUGE-L score of 0.4742 (0.6067 when excluding zero-score cases), indicating moderate alignment between generated and ref-erence answers. The findings suggest that hybrid retrieval with reranking improves legal information ac-cess in Thai, providing a reproducible baseline for future research in legal question answering for low-resource languages.

EFFICIENT SEGMENTATION OF CUSTOMERS BASED ON RFM ANALYSIS

Chattrapat Poonsin and Pruet Boonma

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 246-272

PDF

Customer segmentation is a vital component of data-driven marketing, ena-bling businesses to understand customer behavior and enhance strategic de-cision-making. This study explores an efficient segmentation approach us-ing Recency, Frequency, and Monetary (RFM) analysis, combined with mul-tiple clustering techniques, to identify optimal customer groups. Four clus-tering approaches were implemented and compared centroid-based density based, distribution-based, and hierarchical clustering (Agglomerative). Each of these algorithms were evaluated based on its ability to form well-separated and meaningful clusters, with silhouette score as the primary per-formance metric. The dataset was standardized before applying the cluster-ing models to ensure comparability. The results reveal that different algo-rithms exhibit varying strengths depending on the underlying data struc-ture. K-Means demonstrated efficiency in partitioning customers into dis-tinct groups but struggled with non-spherical clusters. DBSCAN effectively identified outliers but was sensitive to parameter tuning. GMM provided flexibility by modeling cluster probability distributions, making it suitable for overlapping customer behaviors. Hierarchical clustering offered an in-terpretable structure but required significant computational resources for large datasets. Overall, the findings highlight the importance of selecting an appropriate clustering technique for customer segmentation based on data characteristics. This study provides valuable insights for businesses aiming to develop marketing strategies through data-driven segmentation.

REFRIGERANT LEAK DETECTION BY MACHINE LEARNING

Poompatai Muennamnor and Pruet Boonma

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 273-295

PDF

Refrigerant leaks from cooling systems can harm the environment and cost businesses money. Current ways to find leaks can be slow, expensive, and not always accurate. This project uses machine learning to create a better way to detect refrigerant leaks by listening to the sounds they make. The goal is to develop a system that can automatically and cheaply detect leaks early on, reducing environmental damage and saving businesses money. The system uses a microphone to record sounds, then a computer program analyzes the sounds to identify leaks. By using sound analysis, the system can tell the difference between normal sounds and the sounds of a refrigerant leak. This helps catch leaks early, lowers maintenance costs, and reduces greenhouse gas emissions.

Development of Model to Predict Next Day's Asset Price Movements Using Ensemble Classification Techniques

Natchar Pongsri and Nasi Tantitharanukul

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 295-315

PDF

This research presents a predictive model for determining the next-day price direction of EUR/USD in the Binary Options market. The study utilizes technical indicators and price data over a 10,000-day span, collected from TradingView, and applies machine learning techniques particularly an ensemble classification framework combining CNN, LSTM, SVM, and XGBoost models. A total of 23 features were engineered from candlestick data and popular indicators such as RSI, MACD, ATR, and EMA. Statistical analysis ensured data quality and distribution symmetry. Model performance was evaluated using accuracy, F1 score, and ROC-AUC metrics. The resulting ensemble model outperformed individual models in predictive accuracy and stability. This research contributes to the development of automated trading systems and serves as a foundation for further work in financial time series forecasting using machine learning.

DEVELOPMENT OF HUMAN DETECTION AND LOCALIZATION FROM IMAGES BY DRONE IN SEARCH OPERATION

Kridsanaphon Suksan and Paskorn Champrasert

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 316-338

PDF

This independent study presents a system for object detection and localization using aerial imagery captured by drones in search and rescue operations. Generally, higher drone altitude gives greater area coverage, but reduces detection accuracy. While a lower altitude improves accuracy, but requires more search time. Lacking guidance on optimal altitude information, this study explores the various detection performances at different flight altitudes to enhance operational efficiency. Since altitude impacts both image quality and detection accuracy, image resolution is also examined as a key factor in system performance. The study evaluates the YOLOv11 algorithm for detection in aerial images, using clothing as a human proxy to address ethical and data collection constraints. Performance was assessed using Mean Average Precision, Precision, Recall, and Time along with, derived metrics like Efficiency Score and Missing Rate. The geolocation deviation is also measured. Findings indicated that increasing altitude reduces model performance but can be compensated by using a higher resolution image. For missions requiring high detection accuracy, the lowest altitude flights yield the best results. In contrast, more time-constrained operations can benefit from higher altitude but need more computation resources. In general, the study suggests a flight altitude of 40 meters with 1080×720 resolution as the most efficient altitude. At 40 meters, detection accuracy slightly decreases, but area coverage and computation speed improve significantly by roughly three times with the top Efficiency Score and lowest Missing Rate.

Food Consumption Measurement Using Computer Vison: A Case Study on Thai Cuisine

Pattadon Thepkan and Jakarin Chawachat

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 339-356

PDF

This study aims to develop a system for estimating the portion size and energy of Thai food from images using deep learning techniques. The proposed system supports dietitians and health-conscious individuals by enabling automated and accurate food intake assessment. The system consists of two main components: (1) object detection using YOLOv11 to simultaneously identify food items and reference coins in an image, and (2) food weight estimation using ResNet101, with coin objects serving as physical references for real-world scaling. The estimated food weight is then used to calculate nutritional values based on a Thai food database. Experimental results demonstrate that annotating object boundaries with Smart Polygon significantly improves model accuracy and stability compared to the traditional Bounding Box method, yielding higher Precision, Recall, F1-score, and mAP. Among the tested models, ResNet101 with coin references achieved the best weight estimation performance, with a Mean Absolute Error (MAE) of 71.12 grams and Root Mean Squared Error (RMSE) of 91.56 grams. This system is suitable for real-world applications in hospitals, restaurants, and personal nutrition tracking.

Online Platform for English Language Practice and Proficiency

Poosana Thassanavisut and Sakgasit Ramingwong

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 357-364

PDF

This research develops an English learning platform that utilizes a combination of three machine learning models for grammatical topic classification in language assessment. English serves as one of the most crucial languages in today's world, particularly in education, work, and communication. However, learning and developing English language skills remains a significant challenge for many learners, especially in countries where English is not the primary language. Firstly, this independent study aims to address these challenges by developing a comprehensive English learning platform that incorporates advanced machine learning techniques. Secondly, the platform employs three distinct machine learning approaches: facebook/bart-large-mnli, Logistic Regression, and DeBERTa for automated grammatical topic assignment to examination questions. Finally, the empirical results demonstrate that the developed platform effectively enables users to assess their English proficiency according to the CEFR (Common European Framework of Reference for Languages) standards, while providing appropriate skill evaluation across various grammatical topics.

Benchmarking of Thai-Spelling Correction Algorithms

Patiphon Ongartittichai and Phasit Charoenkwan

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 365-380

PDF

This research aims to compare the efficiency of algorithms for detecting and correcting typos in Thai, considering accuracy and processing time, es-pecially the combination of word cutting methods and typo detection algo-rithms, to find the most suitable approach for developing Thai natural lan-guage processing tools (Thai NLP). The data used in the experiment con-sisted of 3 Thai datasets: Thai Toxicity Tweet, Wisesight Sentiment, and ThaiSum, which are human-generated texts from both social media and news articles. The data was then prepared and word cutting was performed using the newmm, deepcut, and attacut processes. Then, typos were checked using the Levenshtein Distance, Hunspell, Peter Norvig, and Word2Vec al-gorithms. The experimental results showed that the combination of word cutting and typo detection algorithms between attacut and Peter Norvig gave the best results in terms of accuracy, while newmm and Hunspell gave the best results in terms of speed. Each method has its own advantages and disadvantages. Therefore, the choice of use should depend on the objec-tives, such as accuracy or speed. In addition, the research also presents a re-usable experimental framework, which is useful for developers and re-searchers who want to evaluate or develop Thai typo detection systems in the future.

Food Consumption Measurement Using Computer Vison: A Case Study on Thai Cuisine

Pattadon Thepkan and Jakarin Chawachat

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 381-398

PDF

This study aims to develop a system for estimating the portion size and energy of Thai food from images using deep learning techniques. The proposed system supports dietitians and health-conscious individuals by enabling automated and accurate food intake assessment. The system consists of two main components: (1) object detection using YOLOv11 to simultaneously identify food items and reference coins in an image, and (2) food weight estimation using ResNet101, with reference coin serving as physical references for real-world size scaling. The estimated food weight is then used to calculate nutritional values based on a Thai food database. Experimental results demonstrate that annotating object boundaries with Smart Polygon significantly improves model accuracy and stability compared to the traditional Bounding Box method, yielding higher Precision, Recall, F1-score, and mAP. Among the tested models, ResNet101 with coin references achieved the best weight estimation performance, with a Mean Absolute Error (MAE) of 71.12 grams and Root Mean Squared Error (RMSE) of 91.56 grams. This system is suitable for real-world applications in hospitals, restaurants, and personal nutrition tracking.

Clothes Styling Assistant Based on Celebrities’ Styles Using Computer Vision and Deep Learning

Tinnapat Jaimunt and Varin Chouvatut

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 399-412

PDF

This study presents a virtual fitting room application that recommends outfits based on celebrity fashion trends using computer vision and deep learning. The system applies hybrid segmentation (semantic and instance) for accurate clothing detection and uses CLAHE preprocessing to enhance image quality. An ensemble model combining CNN, plain NN, SVM, and Random Forest is used for style classification. The application provides users with a style similarity score compared to celebrity outfits. Evaluation through cross-validation and accuracy metrics shows improved performance, highlighting the potential of this approach for intelligent fashion recommendation systems and future Metaverse applications.

White Blood Cells Classification Using Machine Learning and Deep Learning

Tidarat Katsanook, Chalermrat Nontapa, and Kornprom Pikulkaew

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 413-432

PDF

White blood cell (WBC) classification plays a pivotal role in diagnosing and monitoring various medical conditions, particularly hematological and immune-related disorders. This study explores the application of machine learning (ML) and deep learning (DL) techniques to classify WBCs, leveraging their potential to enhance diagnostic precision and efficiency. Using a dataset of 50,000 2D images from the University of North British Columbia, we develop and evaluate models for categorizing WBCs into four key types: eosinophils, lymphocytes, monocytes, and neutrophils. The proposed methodology integrates data augmentation, feature extraction, and advanced classification algorithms, including Convolutional Neural Networks (CNNs) and other statistical approaches. Performance metrics such as accuracy, precision, recall, and F1-score guide the optimization of model architecture and training processes. Experimental results demonstrate the effectiveness of the developed models in achieving high classification accuracy, offering a reliable and automated tool for WBC identification. This research underscores the potential of AI-driven solutions to improve clinical workflows, particularly in resource-limited settings, by providing accessible and cost-effective diagnostic support.

STEEL SALES PREDICTION USING DEEP LEARNING AND TRADITIONAL FORECASTING TECHNIQUES

Patcharaporn Saguanchokvanich, Chompoonoot Kasemset, and Trasapong Thaiupathump

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 433-456

PDF

Steel sales forecasting is a crucial element in the strategic planning of Chiang Mai Center Steel Co., Ltd. This study focuses on forecasting sales of product WR-44202050, the company’s top-selling item, by comparing various forecasting models, including ARIMAX, LSTM, VARX, and Hybrid ARIMAX– MLP. Model performance was evaluated using Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE). The ARIMAX model achieved the highest accuracy, with MAE of 4.28, MSE of 18.28, RMSE of 4.28, and MAPE of only 0.01%. These results indicate that ARIMAX is the most suitable model for steel sales forecasting in this context, offering strong potential as a decision-support tool for production planning, inventory management, and strategic business operations.

Analysis of the Relationship between Product Popularity and Promotion Effectiveness in Beauty Clinics Using Decision Trees and Association Rules

Tanawat Piriyapattana and Chumpol Bunkhumpormpat

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 456-471

PDF

With the growing need for personalized marketing in aesthetic clinics, understanding customer behavior is essential. This study aims to analyze the relationship between Decision Tree and Association Rule Mining to uncover patterns in service selection. Association Rule Mining identifies frequently co-occurring services, such as customers undergoing "Acne Clear 6 times" often opting for "VPL red marks 3 times" (Confidence = 58%) and those selecting "General Aesthetic Treatments" commonly using "Skin Supplement" (Lift = 17.46). Meanwhile, Decision Tree analysis segments customers based on service preferences, visit days, and demographics, revealing that Acne Treatments and Skin & Meso Rejuvenation are the most common, with Botox & Injectables peaking on Saturdays. The integration of both models confirms that Acne Treatment customers frequently choose Skin Repair or Laser Treatment, aligning with the Decision Tree’s segmentation. This study demonstrates that combining Decision Tree and Association Rule Mining enhances service recommendations, allowing clinics to implement targeted promotions, such as Laser Treatment discounts for Acne Treatment customers. The findings highlight the value of Machine Learning techniques in refining customer segmentation, improving recommendations, and optimizing marketing strategies in aesthetic clinics.

Comparison of Minimum Waiting Time and Priority Satisfaction Method with Traditional Method in Electric Vehicle Battery Swapping Scheduling

Nisara Wongutai and Natthanan Promsuk

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 472-504

PDF

Environmental pollution, particularly from greenhouse gas emissions, has emerged as one of the most pressing global challenges, with the transportation sector being a major contributor. Electric Vehicles (EVs) are increasingly promoted as a sustainable solution to mitigate these emissions; however, their adoption is hindered by one critical limitation—the long charging time, which ranges from several hours for conventional chargers to around 30 minutes for fast chargers. To address this limitation, the concept of an EV Battery Swapping Service (BSS) has been introduced, enabling rapid battery replacement within minutes. The mobility of Battery Swapping Vans (BSVs) further enhances flexibility by overcoming the geographic constraints of fixed charging stations. This study proposes a Battery Swapping Service Request Scheduling (BSSRS) model utilizing the Minimum Waiting Time and Priority Satisfaction (MWT-PS) strategy. Using a simulation dataset of 20 service points with one BSV traveling at a constant speed of 40 km/h to service 19 EVs, the results demonstrate that the MWT-PS algorithm significantly improves service efficiency. Compared to traditional scheduling methods, the MWT-PS reduced the total Euclidean distance to 544.79 kilometers and shortened the overall service duration to 13.62 hours, outperforming both First-Come First-Serve (FCFS) and Highest Credit First (HCF) algorithms. These findings highlight the potential of the proposed scheduling approach to enhance EV adoption by making energy replenishment faster, more efficient, and more sustainable.

Forecasting Method Evaluation Framework for Production Planning under Uncertain Customer Demand

Anongphorn Janboonpeng and Chompoonoot Kasemset

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 505-516

PDF

Customer demand volatility in rapidly changing market environments poses significant challenges to production planning, particularly for short life-cycle products that exhibit pronounced seasonal patterns. Exploratory Data Analysis (EDA) conducted in this study revealed clear annual cycles and distinct seasonal peak periods in customer demand, highlighting the need for forecasting methods capable of effectively and reliably capturing such seasonality. Therefore, this research aims to develop a Forecasting Method Evaluation Framework to guide the selection of appropriate forecasting models for production planning under uncertain demand conditions. The proposed framework consists of five key components: (1) data preparation and seasonal feature engineering informed by EDA findings, (2) development of statistical and deep learning forecasting models, including SARIMA, Holt-Winters, LSTM, GRU, and a hybrid SARIMA–LSTM model, (3) performance evaluation using MAE, RMSE, MAPE, and R², and (4) integration of forecast outputs into production planning processes to effectively accommodate demand variability. The framework supports improved production stability by reducing the frequency of production plan adjustments and enhances operational readiness across manpower, machinery, materials, and production methods (Man–Machine–Material–Method), ensuring alignment with future demand conditions.

Defect Detection for Electronics Enclosure Using Convolution Neural Network

Atit Luksida and Prompong Sugunnasil

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 517-530

PDF

The current challenge facing factories in Thailand is the transition to Industry 4.0. The process of appearance inspection has been transformed from human inspection to a computer-assisted tool. The objective of this process is to improve the accuracy of the inspection by removing human judgment. In this study, we propose a convolution neural network (CNN) to detect the defect of electronic enclosure. Then, we compare the proposed method with several other techniques, including SVM and KNN. The testing dataset comprises 1,190 im-ages captured from a camera oriented in a consistent direction. These images were divided into four balanced classes to mitigate any issues related to class imbalance during model training. Although SVM demonstrated superior accu-racy, the substantial time required for training makes it impractical for real-world applications where time efficiency is crucial. In contrast, despite having slightly lower accuracy, CNN showed a beneficial balance between perfor-mance and computational efficiency, making it a more pragmatic choice in many real-world scenarios. KNN, although faster than SVM, had the lowest performance in our tests.

ANALYSIS OF HERDING BEHAVIOR IN STOCK MARKETS USING CROSS-SECTIONAL ABSOLUTE DEVIATION AND GATED RECURRENT UNIT

Thanaphat Sriboonma and Sumalee Sangamuang

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 531-565

PDF

Herding behavior, where investors mimic the trading actions of others, is a critical phenomenon in financial markets, often exacerbating volatility and leading to mispricing and systemic risk. Traditional econometric models like the Cross-Sectional Absolute Deviation (CSAD) have been widely used to detect such behavior, yet they fall short in capturing the nonlinear, dynamic nature of market sentiment. This study integrates behavioral finance with deep learning to enhance the prediction of herding behavior by estimating the CSAD-based herding coefficient (γ₂) using advanced time-series models. Daily stock data from the S&P 500,spanning January 2000 to December 2024, is analyzed using four deep learning architectures: Recurrent Neural Network (RNN), Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), and Time-Series BERT (TST-BERT). The models are evaluated using Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the Coefficient of Determination (R²). Among them, the GRU model outperformed the others, achieving the lowest prediction error and highest R² value of 0.8785, indicating its superior capability in modeling temporal dependencies in financial data. The results affirm that deep learning, particularly GRU, provides a more accurate and robust framework for detecting herding behavior, offering valuable insights for investors, regulators, and policymakers aiming to enhance market stability and risk assessment.

ESTIMATION OF PM2.5 CONCENTRATIONS USING SATELLITE IMAGERY WITH MACHINE LEARNING TECHNIQUES

Piriya Boonchot

Published in Data Science and Engineering (DSE) Record 2025 Vol. 6 No. 1 pp. 566-581

PDF

Air pollution caused by fine particulate matter (PM2.5) in Northeastern Thailand is a significant environmental concern. This study aims to identify the relationship between satellite-derived variables and PM2.5 concentrations and to establish an effective machine learning model for PM2.5 estimation. Sentinel-5P satellite data, comprising atmospheric variables including Carbon Monoxide (CO), Formaldehyde (HCHO), Nitrogen Dioxide (NO2), Ozone (O3), Sulfur Dioxide (SO2), Methane (CH4), and Aerosol Index (AI), were analyzed alongside ground-based PM2.5 measurements from 2018 to 2023. Based on Pearson correlation analysis of the atmospheric variables, it was found that Carbon Monoxide (r = 0.72) and Nitrogen Dioxide (r = 0.51) exhibited the strongest linear relationships with PM2.5 levels. Based on statistical significance and regional source characteristics, five key variables (CO, NO₂, HCHO, AI, and O3) were selected as input features to establish an effective machine learning model for PM2.5 estimation. Several predictive algorithms were developed and evaluated, including Decision Tree Regression (DTR), Support Vector Regression (SVR), Polynomial Regression (PR), Multilayer Perceptron (MLP), and Convolutional Neural Network (CNN). The results demonstrated that the CNN model achieved the superior performance, with the lowest Mean Absolute Error (MAE) of 7.87 μg/m3 and the highest Coefficient of Determination (R2) of 0.63. Although the model exhibited limitations in estimating peak concentrations during extreme haze episodes due to signal saturation, it demonstrated capability in monitoring seasonal trends and regional distribution. These findings highlight the efficiency of Deep Learning models and remote sensing data as valuable supporting tools for air quality monitoring in regions with limited ground-based observations.

All volumes

Data Science and Engineering (DSE) Record

Biannual Journal

Current issue: Vol. 6 2025

All volumes

Vol. 6 2025

Vol. 5 2024

Vol. 4 2023

Vol. 3 2022

Vol. 2 2021

Vol. 1 2020