Thanawat Kaewwiroon and Juggapong Natwichai

Published in Data Science and Engineering (DSE) Record 2021 Vol. 2 No. 1 pp. 29-35

PDF

Abstract

This independent study aims to develop a data pipeline system that is able to preserve the privacy for data ingression in the data pipeline. The system developed using the k-anonymity method with generalization and suppression. The precision of information lossy is concerned and minimized process of Preferred Minimal Generalization Algorithm (MinGen). The system will first calculate the data precision for all possible of the domain generalization hierarchies. Then, process the satisfied k value for the data set for which pattern of generalization level. Finally, the data in database system will be transform to the privacy preservation. The demographic synthesized dataset is generated, and domain categorizes, and its level of quasi-attributes are created which prepared for evaluate the data pipeline system. The indicator of success in this independent study are processing time with the different amount of data records which satisfied the k value. For the results, the more data records spend less processing time for the k value satisfied. Because of the more data records increasing the possible of k records that is similar to others and satisfied the k value. Thus, the Privacy Preservation for Data Ingression in Data Pipeline system which developed in the independent study can process data to satisfied k-anonymity technique which also minimized loss of data precision for demographic data in data pipeline.