Font Size: a A A

Research On Parallel Attribute Reduction Algorithm Based On MapReduce

Posted on:2024-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:L PengFull Text:PDF
GTID:2568307124484734Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Attribute reduction is an effective method to reduce data dimension,and it is one of the important steps of data preprocessing in the field of data mining.As a measure of uncertainty,information entropy is a useful tool to characterize attribute subsets and distinguish information.It is widely used in attribute reduction algorithms.However,when the data with limited labels are reduced,the serial attribute reduction algorithms based on information entropy are faced with the challenge of limited labels and low efficiency.Secondly,when dealing with data with limited labels,the traditional serial attribute reduction algorithm assumes that all data is loaded into memory at one time,which is not suitable for large-scale data sets.To solve these problems,this thesis designs three attribute reduction algorithms,and the specific work is as follows:(1)Some serial attribute reduction algorithms based on information entropy face the problems of limited data labels and low computational efficiency.In this thesis,the inclusion degree function and conditional entropy are introduced to construct local conditional entropy,and then an algorithm for calculating local conditional entropy and a serial attribute reduction algorithm based on local conditional entropy are designed.The algorithm has low time complexity and can effectively reduce data with limited labels.Experiments are conducted on 19 UCI public data sets,and the experimental results show that,The designed algorithm is effective and feasible,and has good classification accuracy and calculation efficiency.(2)In order to carry out attribute reduction for large-scale data sets with limited labels,improve computing efficiency.Based on local conditional entropy and MapReduce programming model,this thesis designs a parallel local conditional entropy attribute reduction algorithm under MapReduce model.Based on the local rough set,a parallel local rough set attribute reduction algorithm under the MapReduce model is designed,which is compared with the serial algorithm in the UCI open data set.By comparing the serial algorithm,the designed parallel algorithm can obtain the reduction result in only one-fifth of the time of the serial algorithm in the 60,000 scale data set,and the improvement effect is obvious.Experiments on two million and four million data sets demonstrate the computational efficiency of the designed parallel algorithm on large data sets.Finally,the parallel performance of the algorithm is verified by experiments.
Keywords/Search Tags:local conditional entropy, MapReduce, local rough set, parallel, attribute reduction
PDF Full Text Request
Related items