Volume: Vol. 3 2022

Issue: No. 1 October

Itsarawadee Hema and Narissara Eiamkanitchart

Published in Data Science and Engineering (DSE) Record 2022 Vol. 3 No. 1 pp. 1-11

The objectives of this independent study consisted of two areas. Firstly, to select appropriate variables of the knowledge-based economy indicators as alternative indicators for predicting Gross Domestic Product (GDP) growth. Secondly, to develop models for forecasting the GDP growth rate using neu-ro-fuzzy technique and compare the model performance. The data used in this work were collected from the World Bank through an Application Pro-gramming Interface, consisting of 5 regions: East Asia & Pacific, Europe & Central Asia, Latin America & Caribbean, Middle East & North Africa, and South Asia. The study investigated and identified the independent varia-bles of the knowledge-based economy that could be used in the GDP growth rate prediction model along with the development of the Adaptive Neuro-fuzzy Inference System (ANFIS) to predict the GDP growth rate. The performance assessment used the prediction results to compare with the Linear Regression (LR) and Artificial Neural Network (ANN) models, us-ing the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). The results showed that ANFIS provided the highest accuracy in predicting GDP growth rate in 14 of 15 experiments from three types of data: training dataset, testing dataset, and unseen dataset), while the ANN and LR models are less accurate, respectively. The East Asia & Pacific region has the lowest error of all regions; with the average MAE and RMSE of the testing and un-seen datasets at 0.265% and 0.345%, respectively.

Yali Ye and Sakgasit Ramingwong

Published in Data Science and Engineering (DSE) Record 2022 Vol. 3 No. 1 pp. 12-24

This research uses a combination of principal component analysis and mul-tiple linear regression analysis for the analysis of the influencing factors and the degree of influence on the foreign trade of Sichuan Province, China. Firstly, a visual analysis of the current situation of foreign trade in Sichuan province is conducted, which includes the structure of foreign trade mode, foreign trade commodity structure, and foreign trade partner structure of Sichuan prov-ince. Second-ly, the indicators that may affect the foreign trade of Sichuan prov-ince are selected and the empirical model is constructed using these indica-tors. Principal component analysis was used to extract principal components, and then regression analysis was conducted on the extracted principal com-ponents. Finally, the empirical results show that the gross regional product, consum-er price index, foreign direct investment, average disposable income of resi-dents, and investment in research and experimentation have a positive im-pact on Sichuan's foreign trade, and the RMB harms foreign trade.

Han Yang and Pruet Boonma

Published in Data Science and Engineering (DSE) Record 2022 Vol. 3 No. 1 pp. 25-34

This paper takes reviews of five different refrigerators in Jindong Mall as the research object, collects text review data on the e-commerce platform through a Python web crawler, classifies user sentiment after basic pre-processing operations such as de-weighting and cleaning, and analyzes user review text data through word separation processing, word frequency statis-tics, and display. The LDA theme analysis model is used to analyze the re-view data thematically, to obtain the valuable contents of the text review data through multifaceted analysis and make suggestions for the improve-ment of products.

Kanokrot Phuruan and Chompoonoot Kasemset

Published in Data Science and Engineering (DSE) Record 2022 Vol. 3 No. 1 pp. 35-48

Shallot is one of important agricultural products exported with high vol-ume to many countries. Shallots are mainly cultivated at the northern region of Thailand. Price of shallot in different periods during a year is changed from many related parameters. This research aimed to develop the forecast-ing model of shallot’s price using combination techniques from ARIMA and LSTM (ARIMA-LSTM). Considering independent parameters, ARIMA was applied for predicting effects from parameters with time-series and lin-ear relationship, whereas LSTM was applied for predicting effects from pa-rameters with non-linear relationship. Data collected 84 months during January 2014 to December 2020 were applied in this research. The accuracy of the proposed model was evaluated using three indicators including Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Abso-lute Percentage Error (MAPE). The results presented that our ARIMA-LSTM model gave minimum values of RMSE, MAE and MAPE as 10.275 Baht, 8.512 Baht, and 13.618%, respectively. Moreover, the value of MAPE was in good forecasting level that can be implemented practically.

Krittai Tanasombatkul and Juggapong Nartwichai

Published in Data Science and Engineering (DSE) Record 2022 Vol. 3 No. 1 pp. 49-53

Since antibiotic use in the today's world is exceedingly high, it has finally led in the global development of antibiotic resistance, affecting patient treatment and outcomes. If antibiotic therapy fails, critically ill patients admitted to intensive care units (ICUs) are at a higher risk of significant morbidity and mortality. Despite the hospital's development of the "Antibiotic smart use team" to manage antibiotic consumption to make it more cost-effective and safer, the number of antibiotics used in intensive or critical care unit wards keeps rising. In this independent study, we will focus on developing a reproducible automated process to extract and transform antibiotic prescription data from critical care units into appropriate transactional data, allowing for analysis and visualization and revealing significant insights from antibiotic usage in critical care units, and which will be able to supply as supporting data for the Antibiotic Controller's Board of Directors' quarterly meeting to improve antibiotic safety.

Chawalit Chanintonsongkhla and Varin Chaovatut

Published in Data Science and Engineering (DSE) Record 2022 Vol. 3 No. 1 pp. 54-65

This independent study aims to develop a model for segmenting proximal dental caries using a fully convolutional neural network in bitewing radio-graphs. The segmentation models were created with the explicit goal of helping dentists in segmenting dental caries in radiographs for a second opinion. To determine the most appropriate model architecture, we com-pared the performance of three fundamental segmentation models: U-Net, FPN (Feature Pyramid Network), DeepLabV3+, and XsembleNet, a combi-nation of the three preceding models. The system is evaluated in two ways. The first is to assess segmentation quality using the dice coefficient; empir-ical experiments indicate that XsembleNet has the highest dice coefficient, followed by FPN. The second evaluation is to rate models’ 12 testing bitewing radiographs segmentation. While all four models are comparable in accuracy and specificity, XsembleNet and FPN jointly achieve the high-est classification metrics score. As a result, it can be concluded that a fully convolutional neural network could be used to detect dental proximal car-ies radiographs via computer-assisted diagnosis.

Jie Yang and Sakgasit Ramingwong

Published in Data Science and Engineering (DSE) Record 2022 Vol. 3 No. 1 pp. 66-77

There is a relatively well-established process for using machine learning to predict influencing factors and sales, but for small and medium-sized enter-prises, they often face problems such as low data volume and unrepresenta-tive data types, and the large data requirements become the threshold for us-ing machine learning methods to help business activities. The original data for this study was sourced from publicly available data from Alibaba's Tianchi platform, containing sales data from a small shop in three different branches. This paper studies the influencing factors from the correlation of data and uses random forest regression method to rank the importance of features. In order to predict sales, this paper uses a pre-training model to compare and analyze multiple machine learning models. The results show that the pre-training method has different degree of improvement or decline for different models.

Patcharapol Yasamut and Pree Thiengburanathum

Published in Data Science and Engineering (DSE) Record 2022 Vol. 3 No. 1 pp. 78-92

The demand for electricity in buildings on a national and international scale is currently rising rapidly. Building electricity usage can be decreased by using a forecasting model. It can reduce utility costs not just for one build-ing but also throughout a whole region. According to literature review, ma-chine-learning and deep-learning techniques have been used in previous studies on forecasting electricity consumption. However, there is a dearth of research into the use of clustering to predict electricity consumption in tropical regions such as Thailand or any of the countries in Southeast Asia. In this project, we present new research for hourly forecasting building en-ergy usage. 1-hour interval electricity consumption data is collected from nineteen buildings for a year and five months by smart meters. 1-hour inter-val weather data including PM 10, PM 2.5, temperature, and humidity col-lected is also collected from one building. The analysis of the cross correla-tion between weather data and electricity consumption indicated that that there was a weak correlation between weather and electricity consumption data. Vector Auto Regression (VAR), Vector Auto-Regressive Moving Av-erage (VARMA), Support Vector Machine (SVM) and Multi-Layer Percep-tron (MLP) models were used to develop the forecasting models as the base-line models. The SVR model can outperform the other models with the low-est RMSE validation scores on training dataset. The hyperparameters of SVR models were optimized to maximize forecasting accuracy on training dataset. To reduce time consuming for training and optimizing the models, the k-Shape clustering approach is used to analyse electricity consumption into pattern groups. We used the centroid of each cluster as a representation of the cluster's electricity consumption data in order to forecast the electric-ity consumption of buildings within the cluster. The result of comparing the forecasting performance of SVR with and without clustering technique by using t-test indicated that there is no statistically significant evidence that the forecasting performance of SVR model with and without clustering technique are different at P-values of 0.7258.

Shuming Wang and Phisanu Chiawkhun

Published in Data Science and Engineering (DSE) Record 2022 Vol. 3 No. 1 pp. 93-111

The key words of this study are data visualization, RFM model and customer relationship management. Taking Wal-Mart as a research case, analyze the number of goods and stores in Wal-Mart supermarkets in four years (2011-2014) through data and visualize the results. Using visualize tool to analyze sales and com- modity and present the results in a chart. It is particularly important to retain customers and improve customer retention rates in an environment of increasingly fierce compe- tition among supermarket retailers. First, the supermarket retail data is visualized to make the data set more intuitive and easy to understand. Then the supermarket can choose the RFM (Regency, Frequency, Monetary) analysis method to distinguish cus- tomer value, so as to provide marketing services for different customers. The applica- tion of RFM first conducts basic evaluation through data visualization methods, and has a general grasp of retail data, then conducts RFM modeling to obtain the customer's RFM score, finally differentiates customer value.