Current issue: Vol. 4 2023


Yang Zhang and Thaned Rojsiraphisal

Published in Data Science and Engineering (DSE) Record 2023 Vol. 4 No. 1 pp. 1-11

In the financial market analysis field, machine learning techniques for stock price prediction have garnered considerable interest. This study investigates the effectiveness of Long Short-Term Memory (LSTM) models in predicting stock prices for growth stocks and the CSI 300 index in the Chinese A-share market. The study also explores the influence of various technical in-dicators of the LSTM model and models on forecasting accuracy. The exper-imental results demonstrate that the LSTM model is the most effective in predicting stock prices in the A-share market, while other algorithms such as WMA and ARIMA are not as successful in forecasting long-term stock market data. This study proposes some modifications further to enhance the accuracy and dependability of the prediction model.

Chanchanok Aramrat and Pruet Boonma

Published in Data Science and Engineering (DSE) Record 2023 Vol. 4 No. 1 pp. 12-35

Bacterial data are under-utilized in Maharaj Nakorn Chiang Mai hospital. Bacterial data contains information regarding the bacteria that are isolated from various biological samples collected in routine clinical cares. The data can be used to create bacterial profiles and antibiotics susceptibility pro-files which help doctor decide on the most appropriate antibiotics agent to be given to patients with infection. The aims of this study were to develop an application which create bacterial profiles and antibiotics susceptibility profiles by utilizing the hospital bacterial data. To do this, the study was sub-divided into 4 parts 1. Development of ETL process to prepare data for utilization, 2. Data quality assessment, 3. Development of pilot application utilizing prepared data to create bacterial profiles and antibiotics suscepti-bility profiles, and 4. Feasibility assessment of the pilot application. All data was extracted from Maharaj Nakorn Chiang Mai hospital database from 2017 to 2018 with an assistance from hospital information technolo-gy (IT) personnel. All extracted data was explored and compile into one ta-ble to be utilized by the pilot application. The pilot application was written in Google Collaboratory. Overall, the data quality was good. There was some missing data but should barely affect reliability and performance of the application. For feasibility assessment, the pilot application was given to 6 doctors conveniently selected from all doctors working in the hospital for test uses. Later, the doctors were interviewed and asked to provide feed-backs on the pilot application. The application received positive review overall. Improvement points were addressed focusing on data cleaning and preprocessing, minimizing any potential bias. This study provides insight into the development processes of the pilot application that provide bacterial profiles and antibiotics susceptibility profiles to doctors. Modifications are required before such an application can be used in clinical practice.

Supaporn Thankham and Pruet Boonma

Published in Data Science and Engineering (DSE) Record 2023 Vol. 4 No. 1 pp. 36-51

This independent study is a comparative study on privacy impact assessment metrics on multi-domain transactional processing: case study of Registration Office, Chiang Mai University. a privacy impact assessment should be conducted on which personal data, and what the high-risk data are, in order to guide other entities that have multi-domain linkage for doing a DPIA (Data Protection Impact Assessment) on high-risk data to ensure the security of personal infor-mation Including the storage and management of various per-sonal info-mation appropriately. The researcher used the three tools, which include GS1 tool, iPIA tool, and SPIA tool, and conducted a DPIA using the ISO-IEC-27001-2013 Standard Framework and NIST Cybersecurity Framework to be guidelines for designing the specified DPIA.

Preeyanoot Moontee and Trasapong Thaiupathump

Published in Data Science and Engineering (DSE) Record 2023 Vol. 4 No. 1 pp. 52-55

Complaints are expressions of dissatisfaction or grievances about a service that can be used to improve and develop the quality of service. However, there are several levels of complaints according to the level of severity, such as dividing the severity into two levels: the level that did not require warn-ing and the level to be notified to corresponding personnel and require im-mediate actions to solve the problems. The purpose of this independent study was to study the analysis and notification of complaints in health care services by using complaints from health care recipients and the severi-ty of complaints assessed by experts. The researcher obtained the infor-mation from the study to develop a system for analyzing complaints and notifying when there are complaints that are categorized as having to be no-tified through the LINE application. The study found that Multinomial Na-ïve Bayes had the highest efficiency in complaint classification compared to the Accuracy value of 71%.

Mallika Chali and Arinya Pongwat

Published in Data Science and Engineering (DSE) Record 2023 Vol. 4 No. 1 pp. 56-68

This independent study aims to understand the behaviors of tourists affect-ed by hostel accommodation services in a Mueang Chiang Mai district, Chiang Mai province. The data used in this research was collected from TripAdvisor.com, 5,108 messages were crawled and separated into 17,092 sentences. In-depth interviews with hostel entrepreneurs provide insights for the essential aspects considered when managing their businesses. These aspects together with aspects from related studies serve as classification cri-teria for sentiment analysis using Support Vector Machine (SVM) and Mul-tinomial Naïve Bayes (MNB) algorithms. The SVM model achieves 93% ac-curacy, outperforming MNB's 82%. Text-mining analysis explores hostel business development. The findings reveal that SVM is suitable for classi-fying customer review messages, and exhibiting satisfactory performance and accuracy. The aspects discovered in this studies include cleanliness, fa-cility, location, quality of staff, security, social atmosphere, and value of money. The results of the current study contribute to the theoretical context for academic as well as practical guidelines for the hostel managers in gen-eral.

Sukrit Akarametagul and Paskorn Champrasert

Published in Data Science and Engineering (DSE) Record 2023 Vol. 4 No. 1 pp. 69-81

This research is conducted to develop an artificial intelligence and machine learning system that can detect riders who do not wear helmets and to analyze and compare the capabilities of YOLO and RetinaNet algorithms in detecting these riders. The data from the CMU Smart Gate system's LPR (License Plate Recognition) camera, which detects the data of vehicles entering and exiting the gates of Chiang Mai University, was used for training and measuring the system performance. The results showed that both YOLO and RetinaNet algorithm could be used to develop a system to detect motorcyclists who do not wear helmets. However, the RetinaNet algorithm training model mean precision of 0.999 was higher than that of the YOLO algorithm which is 0.983. Precision specific to detecting motorcyclists without helmets both algorithms got the same result of 1.000. When the model was tested for processing time per image, the YOLO algorithm took less time to execute than the RetinaNet algorithm. At average value, the YOLO algorithm took 0.152 seconds. The RetinaNet algorithm took 1.659 seconds.

Nopphorn Somrit and Chumpol Bunkhumpornpat

Published in Data Science and Engineering (DSE) Record 2023 Vol. 4 No. 1 pp. 82-95

This independent study is to develop a system to analyze the behavior of customers who use mobile phones by using the Clustering model as a tool to group customers according to their behavior. The researcher conducted a comparative study of appropriate behavioral grouping. It was divided into two sub-studies to study clustering by constructing two basic models: K-means and DBSCAN from those two basic models to find the most suitable clustering method for the data set characteristics. this Comparison of the average accuracy of classification of customers in each group from all five models: Random Forest, Decision Tree, SVM, Naïve Bayes, and KNN. Performance measurement of the system developed in this study. It is a comparison of accuracy found that K-means clustering has better customer classification efficiency than DBSCAN. From the experimental results, it can be said that the K-means model has a higher mean accuracy of five classification models than DBSCAN.

Witidtayapond Promana and Sakgasit Ramingwong

Published in Data Science and Engineering (DSE) Record 2023 Vol. 4 No. 1 pp. 96-107

This independent study attempts to find the most appropriate model for forecasting price of Ribbed Smoked Sheet No.3 in Thailand. Daily price from Rubber Authority of Thailand since 2011 to 2021 are used as raw data. A total of 2,618 values of daily rubber price are divided into training and test sets. The training set involves 2,376 values from 2011 to 2020. It is used for constructing four forecasting models i.e. moving average, Holt’s method, Box-Jenkins method and Neural network. The test set, including 242 values from 2021, is used for comparing accuracy of the forecast via cri-teria of the lowest. The finding indicates that the Neural network by non-linear autoregressive neural network (NNAR) is the most suitable for fore-casting price of Ribbed Smoked Sheet No.3. This method has the least Mean absolute error (MAE) of 4.5352, Root mean square error (RMSE) of 5.7807 and Mean absolute percentage error (MAPE) of 7.3309. Respectively, Box-Jenkins method, Moving average and Holt’s Method are found to provide less accurate result.

Suchada Manowon and Pruet Boonma

Published in Data Science and Engineering (DSE) Record 2023 Vol. 4 No. 1 pp. 108-124

Flight delays persist as a challenge, which impacting airline and airport productivity, passenger experience, and financial resources. Nowadays, air transportation data predominantly rely on administrative records from var-ious institutions. This study aims to designing and implementing an effec-tive data pipeline system with the capacity to capture high-frequency data from diverse sources through batch processing. This comprehensive pipe-line encompasses the entire of end-to-end data pipeline stages; including data sourcing, ingestion, processing, storage, and analysis. The proposed pipeline system extracts data from various datasets, including flight data, airport information, airline details, airplane specifications, and routes. It employs a variety of methods such as web scraping, APIs, and da-tabase loading for data ingestion. It efficiently consolidates flight infor-mation, transforming and cleaning data and then loading it into a designated destination database. Additionally, this study establishes an automated batch processing platform using Apache Airflow. This platform is character-ized by a comprehensive evaluation across three essential aspects; 1. Sys-tem metrics, including memory and disk usage, 2. Job metrics extracted from Airflow metrics, which are utilized to monitor processes, ensuring smooth execution, 3. Data quality metrics that assess six dimensions – accuracy, validation, completeness, consistency, uniqueness, and timeliness – to en-sure the usability of the defined data. Leveraging the flight dataset for data analysis and data visualization, this approach involves the comparison of various base regression models for flight delay prediction. Additionally, flight data dashboards offer data in-sights. The implications of this multifaceted approach extend to enhancing air transportation statistics, predictive modeling capabilities, and facilitat-ing data-driven decision-making processes.

Pittayathon Rinkaewngam, Varin Chouvatut, and Jiraporn Khorana

Published in Data Science and Engineering (DSE) Record 2023 Vol. 4 No. 1 pp. 125-131

The occurrence of imbalanced class in a dataset causes the classification re-sults to tend to the class with the largest amount of data (majority class). A sampling method is needed to balance the minority class (negative class) so that the class distribution becomes balanced and leading to better classifi-cation results. This study was conducted to overcome imbalanced class problems on the Nonoperative reduction of intussusception dataset using ADASYN, SMOTE-NC and k-means-SMOTE. The dataset has 173 in-stances of the positive class (majority class) and 79 instances of the nega-tive class (minority class) by comparing the classification (Logistic Regres-sion, SVM, and Decision Tree) while implementing Decision Tree with SMOTE-NC Oversampling and Decision Tree with K-means SMOTE Over-sampling has the highest accuracy of 94%, while Support Vector Machine with Non-Oversampling produces the highest sensitivity of 100%

Mongkhol Rattanakhum, Pruet Boonma, and Sumalee Sangamuang

Published in Data Science and Engineering (DSE) Record 2023 Vol. 4 No. 1 pp. 132-143

This independent research aims to rank electronic commerce (E-commerce) commercial products by utilizing data from E-commerce platforms for study and investigation. It incorporates a tool known as "Page Rank" and combine with consumer satisfaction to reorganize the product ranking. The goal is to promote new or less-recognized products to improve their visibility. This point is the problem commonly faced by businesses that want to increase sales and maximize product distribution but struggle to showcase products that have not yet garnered significant sales, causing a vicious cycle of low visibility. In this research, have implement products same store as variables in calculating Page Rank scores. The testing results show that by combining Page Rank scores with consumer satisfaction ratings, products that were previously displayed in the middle or lower rankings on the E-commerce platform can be repositioned to appear at the top or on the first page more frequently and result from user acceptance test is not variant.

Bao Xiaohui and Prompong Sugunnasil

Published in Data Science and Engineering (DSE) Record 2023 Vol. 4 No. 1 pp. 153-163

This study propose a comprehensive machine learning model for predicting the employment region choices of normal higher vocational college students. By undergoing meticulous data pre-processing, comparing diverse algorithms, and resampling methods, we developed an excellent predictive model whose F1-score can reach 0.991 ± 0.015. In addition, we also provide the application methods and scenarios of our model. As nations strive for educational equity, our findings offer a potent predictive framework to inform strategies for attracting capable graduates to rural teaching roles, thereby advancing educational parity and societal development.

Warut Sanwibhuk and Pruet Boonma

Published in Data Science and Engineering (DSE) Record 2023 Vol. 4 No. 1 pp. 144-152

This independent research aimed to develop a model for predicting the ripeness level of durians based on knocking sounds. The ripeness levels were categorized into 3 levels are raw, unripe and ripe. Each level have unique sound responsed when knocking. To achieve the highest accuracy in model develop-ment, the researcher have compared the data feature extraction with 3 methods: Mel-Spectrogram, Short-time Fourier Transform (STFT) and Mel-Frequency Cepstral Coefficients (MFCC). In the model development phase, The Convolu-tional Neural Network (CNN) algorithm was selected bulding the prediction model. To evaluate the performance of the developed model, accuracy and F1 Score was measured by comparing along to data feature extraction 3 methods. It was found that the Short-time Fourier Transform (STFT) method yielded the highest both accuracy value and F1 Score. The resulted by training dataset give 99% of accuracy value and 94% of F1 Score. Along with the blind testing data give 99% of accuracy value and 92% of F1 Score.