Font Size: a A A

Research On Incomplete Data Processing Method Based On Rough Set And LDA

Posted on:2021-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:P W WangFull Text:PDF
GTID:2428330611471131Subject:Software engineering
Abstract/Summary:PDF Full Text Request
After entering the 21st century,Internet technology has developed rapidly,and data can be quickly acquired and stored online,which brings opportunities for data mining,but incomplete data is often obtained for various reasons.How to deal with incomplete data accurately and effectively is the current research hotspot.Therefore,this paper firstly researches on incomplete data,and makes incomplete data become complete data through filling by optimizing the filling algorithm,and then conducts research on the classification of complete data.The main research work of this paper is as follows:(1)In the process of filling with similar samples,it cannot be accurately located,and it is susceptible to data interference during filling,which affects the algorithm filling effect.A sample missing dimension filling algorithm based on rough set is designed.The algorithm first uses rough sets for attribute reduction,and then performs k-means clustering on the reduced decision table,and uses similarity to compare the samples to be filled with the clustering results,so as to accurately locate more similar samples.Finally,the least square method is used as the core idea to fit the data in the corresponding missing dimensions,reducing the interference of unrelated data.Experimental results show the effectiveness of the algorithm in this paper.(2)In the process of class prediction using neighbor samples,the large training set and the difference in the number of sample features affect the classification effect of the algorithm.An improved sample mean KNN algorithm based on linear discriminant analysis is designed.The algorithm first uses the linear discriminant analysis method to reduce the adverse effects caused by the number of sample features and their differences.Furthermore,the similarity between the tested samples and all kinds of samples is compared,the training set is selected selectively,and finally the nearest neighbor K is calculated by using the improved distance formula.Experimental results show the effectiveness of the proposed algorithm.(3)For the thyroid samples in the medical field,the data processing flow of an improved sample mean KNN algorithm based on linear discriminant analysis and a sample missing dimension filling algorithm based on rough set are analyzed.The two algorithms designed in this paper are used for experimental verification of thyroid samples,and the results showed that the designed algorithm are effective.
Keywords/Search Tags:Missing Value, Rough Set, Fill Algorithm, Linear Differential Analysis, The Least Square Method
PDF Full Text Request
Related items