Font Size: a A A

Analysis And Research Of Machine Learning Model Based On Spark

Posted on:2018-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:J R HouFull Text:PDF
GTID:2358330515455927Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Under the time background of distributed computing to the mainstream,based on graphs and framework of distributed applications frequently efficiency and performance of I/O operation makes it will not be able to get perfect embodiment.Based on RDD can Spark distributed computing framework to load data into the memory,a great deal of adaptive iterative machine learning model of specific requirements.Based on graphs according to the present problems existing in the design of the machine learning model(mainly the essence of the MR),based on Spark machine learning model are studied,mainly including KMeans clustering,ALS collaborative filtering.And studied based on the Spark Streaming online machine learning modeLThe following is the main analysis and research of the article content abstract:under the background of era of big data,distributed computing has become the mainstream.Based on graphs and framework of distributed applications frequently efficiency and performance of I/O operation makes it will not be able to get perfect embodiment.Based on RDD can Spark distributed computing framework to load data into the memory,a great deal of adaptive iterative machine learning model of specific requirements.Based on graphs according to the present problems existing in the design of the machine learning model(mainly the essence of the MR),based on Spark machine learning model are studied,mainly including KMeans clustering,ALS collaborative filtering.And studied based on the Spark Streaming online machine learning modeLThe following is the main analysis and research of the article content abstract:(1)ALS(least square)is a collaborative filtering recommendation algorithm recommended by matrix decomposition,it calculated by a combination of a large number of user rating data,and store the calculation process of a large number of characteristic matrix.Hadoop-HA(High Available)is used to solve the problem of the single point of failure of the NameNode.The Spark is a computing framework based on new type of large data come up with distributed memory,at the same time it has excellent computing performance.This study using the QJM(Quorum Journal Manager)to constructed the HA Hadoop big data platform.In this study,using the ALS collaborative filtering algorithm with the spark coding Framework,at the same time,this study realized the ALS collaborative filtering algorithm based on the Spark of parallel operation.Through the comparation experiments(the ALS collaborative filtering algorithm based on Hadoop graphs thought and the Netflix data set),the study based on Spark platform of parallel computation is more efficiency.It is more suitable for processing huge amounts of data.(2)Distributed computing framework based on spark was designed and implemented in parallel KMeans clustering model,and through the model in different sizes of MovieLens data set for training on the comparison experiment,the results show that the parallel KMeans clustering model is suitable for operation under the large distributed data environment,and parallel computation efficiency is also doing well;Secondly through the repartition operator load data,optimize parallel scheme,reduce effectively the training time of the model(3)The poor ability to deal with huge amounts of data realtime response based on MapReduce framework,designed and implemented based on the Spark Streaming online calculation model for large-scale KMeans clustering analysis.The whole process can be divided into the model data access,online training modules,Each module through data linger form task entity,Submitted to a Spark distributed cluster.Through the comparative analysis experiment and performance testing,validate the online KMeans clustering model with the advantages of high throughput and low delay,and cluster running in good condition.
Keywords/Search Tags:Spark Streaming, Machine Learning, Online Learning, KMeans, ALS
PDF Full Text Request
Related items