Font Size: a A A

Research And Optimization Of Parallel Extreme Learning Machine Algorithm For Big Data

Posted on:2019-05-14Degree:MasterType:Thesis
Country:ChinaCandidate:J LongFull Text:PDF
GTID:2428330596463278Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Owning to its good generalization performance and fast learning speed,Extreme Learning Machine(ELM)and Online sequence extreme learning machine(OS-ELM)have been wildly used in many fields such as text classification,image recognition and bioinformatics.However,the datasets in real-world applications become larger and larger,traditional ELM cannot learn such massive data efficiently and fast.Apache Flink is an efficient,distributed,extensible and fault-tolerant distributed memory computing platform for big data based on Java.In this paper we use the distributed framework provided by Flink to efficiently realize and optimize PELM and POS-ELM.PELM and POS-ELM can utilize the machines in the cluster to perform distributed parallel processing on large-scale data sets,which makes up for the shortcomings of traditional centralized extreme learning machine algorithms in processing large data sets.In the parallel design and implementation process of the algorithm,the algorithm is parallelized,optimized and implemented in the following ways.(1)Firstly,the ELM and OS-ELM calculation process is analyzed and disassembled,and the whole process of the algorithm is divided into sub-steps.Then we analyze the data dependencies and processing bottlenecks between sub-steps,divides data processing and matrix operations into parallel parts and non-parallel parts,and makes reasonable parallel design for ELM and OS-ELM.(2)Since the programming mode on Flink is MapReduce,the parallelization design based on MapReduce model is carried out.(3)The bottleneck in parallel processing is deeply analyzed.The reasonable data partitioning is used to reduce the synchronization and communication time of data in the cluster in the parallel process,thus improving the parallel processing performance of the algorithm.(4)To further improve the performance of the algorithm,matrix multiplication is optimized by calling the java linear algebra library Matrix.Experimental results show that PELM and POS-ELM not only have the training precision and generalization ability of traditional ELM algorithm,but also have good expansibility and high acceleration ratio.
Keywords/Search Tags:ELM, OS-ELM, Flink, big data processing, parallelization
PDF Full Text Request
Related items