Font Size: a A A

Research On Data Preprocessing Framework Based On Machine Learning

Posted on:2022-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:S Q ZhongFull Text:PDF
GTID:2518306605487774Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the acceleration of social information construction,there are a large number of application level data in various industries in the society,and the increase of the amount of data information forces the difficulty of data mining.Therefore,how to extract valuable knowledge and information from these potential data and use them as practical applications has become one of the research hotspots in the whole field of data mining.In order to ensure the better implementation of data mining,we must first ensure the quality of the data itself.High-quality data can ensure the highest efficiency and optimal results of data mining.The accurate completion of data preprocessing lays a good foundation for us at the beginning of data mining.The research goal of this subject is based on the all-in-one card consumption flow data as the sample,considering from the direction of practical application.A framework combining K-means clustering algorithm based on machine learning and data preprocessing is proposed.The data processed by the data preprocessing framework can meet the data requirements of machine learning algorithm,and then the preprocessed data can be further integrated and optimized by the clustering algorithm to obtain the data that can be actually analyzed and applied,This proves the practical significance and application value of the all-in-one card data preprocessing model proposed in this paper.The main work of this paper is as follows: 1)select the data preprocessing method combined with the sample data characteristics;2)Some traditional machine learning clustering algorithms are proposed and discussed.Through the analysis of the characteristics of Campus All-in-one Card consumption data,the clustering algorithm most suitable for the data preprocessing framework proposed in this paper is selected;3)A data preprocessing framework based on machine learning is designed.This framework mainly includes four modules: Data desensitization,data cleaning,data protocol and data collection;4)Combined with the experimental data samples of this subject,firstly,the Campus All-in-one Card data is selected and systematically descriptive analyzed.Secondly,the related technology of machine learning is combined with it,and finally the clustering algorithm is used to further process the data set to verify the data preprocessing framework;5)Through the clustering of all-in-one card data of student groups,this paper analyzes five groups of clustered groups,and shows the personality characteristics and behavior habits of each clustered group.Finally,it is found that through the cluster analysis of all-in-one card data,we can clearly see the category differences between students.These differentiated results can enable the school to put forward rectification and management measures for the bank in some aspects,so as to achieve better management effect.Similarly,the framework can also be applied to other similar all-in-one card consumption scenarios,so that its data can have practical use value.
Keywords/Search Tags:data mining, Data quality, Campus card data, Data preprocessing, Machine learning, clustering algorithm
PDF Full Text Request
Related items