Nat Weerawan, Pruet Boonma

Published in Data Science and Engineering (DSE) Record 2020 Vol. 1 No. 1 pp. 9-15


This independent study aims to develop a data pipeline system that is able to transform a printed standard of business cheque image into digital numeric data using the OCR technique. This system developed specifically to enhance the efficiency of the data input process of the insurance claim payment process. The evaluation of the system is in two folds. The first one is to evaluate the effi-ciency among different algorithms used in building the OCR system based on accuracy and runtime. The selected algorithms are k-Nearest Neighbors (kNN), Support Vector Machine (SVM), and Gradient Boosting Machine (GBM) respec-tively. GBM was found to be the most accurate and it demanded the least runtime among the three techniques. The second one is to appraise based on the result of evaluation survey from 10 experts who are either the developers or the person in charges of claiming process in the insurance industry. The survey result shows that both of the accuracy and speediness of the system developed is outstanding and satisfaction. Therefore, it can be concluded that the purposed system can increase capability of data input process of the insurance claim payment process.