Font Size: a A A

Research On Parallelization Of Fuzzy Integral Algorithm

Posted on:2019-06-19Degree:MasterType:Thesis
Country:ChinaCandidate:R J ChenFull Text:PDF
GTID:2428330563485144Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the development of Internet,massive data gathered to form a big data environment.Data mining has become a research hotspot.Fuzzy integral in data mining is an excellent tool for information fusion.It has been applied successfully in many classification problems.However,fuzzy integral has exponential time complexity and spatial complexity,which are difficult to apply in big data mining.At present,there are many mature parallel computing frameworks.Some traditional data mining algorithms improve their efficiency and availability by combining with parallel computing.Spark is a memory based distributed parallel computing framework with good robustness and scalability.On data mining iterative algorithms,Spark is more efficient than Hadoop MapReduce.In view of this,this paper presents a fuzzy integral algorithm based on Spark parallel computing and sparse storage,which extends the application of fuzzy integral to large data mining.At the same time,combination models and fusion models based on fuzzy integral are proposed.The main contents of this paper are as follows:(1)In view of space complexity and time complexity of fuzzy integral algorithm,we use a binary accumulator to optimize the algorithm based on the process of solving combined eigenvalues in fuzzy integral.Sparse storage and Spark parallel computing technique are introduced to make up a parallel sparse fuzzy integral algorithm which is called PSFI algorithm.The experimental results show that storage space is compressed by sparse storage on PSFI algorithm,and storage efficiency and model training efficiency are improved.Parallel computing greatly shortens computing time and improves computing speed.(2)In view of efficiency of Python language,Cython programming is introduced to optimize and package the algorithm into a library such that we achieve an efficient PSFI algorithm.Experimental results show that efficiency of PSFI algorithm based on Cython is increased by more than 40 times.(3)In view of optimization problem of data mining model,combination optimization of PSFI algorithm and existing data mining model is proposed.PSFI algorithm is used to extend the feature combination ability of existing models,and to find combination models with higher prediction accuracy.The experimental results show that combination models of PSFI algorithm and Fisher discriminant,logistic regression have higher prediction accuracy.(4)Study the application of PSFI algorithm in model fusion,take advantage of fuzzy integral in information fusion,use fuzzy integral to fuse current excellent data mining models,and try to find a better fusion model.The experimental results verify that the fusion model of PSFI algorithm and Xgboost model has a better effect in unbalanced classification problems.This work extends fuzzy integral algorithm to distributed domain for the first time,solves the high complexity problem of fuzzy integral on single machine and demonstrates the application potential and advantages of fuzzy integral in data mining through simulation experiments,which is of great significance and heuristic value for the research of data mining based on fuzzy integral.
Keywords/Search Tags:Fuzzy Integral, Fuzzy Measure, Parallel Computation, Sparse Storage, Classification Application
PDF Full Text Request
Related items