Font Size: a A A

The SMRI Classification Diagnosis And Movie Recommendation Based On Spark

Posted on:2017-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2348330488952022Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the development of information and communication technology, the explosive growth of the data, the technology of big data is attracting much more attention. It is how we utilize the mass data that makes the difference rather than the data itself. When we effectively exploit the potential value of big data, it can always provide better service to us. It is an important research topic to choose a right processing platform as well as machine learning algorithms for different application scenarios so that we can process the large-scale data quickly and effectively.Aiming to providing a big data processing and computing framework, Spark originated in AMPLab of UC Berkeley. It has been the major processing tool in the field of big data by virtue of its advantage in iterative machine learning and in-memory computing. The Spark-centered Berkeley Data Analytics Stack, which includes the Spark Streaming, SQL, MLlib, GraphX, SparkR and some other modules, can be applied to a variety of big data scenarios.This thesis analyzes the ecosystem, core concept and operation architecture of Spark, and then builds the Spark clusters and application development environment. The clusters adopts the Hadoop Distribute File System to store data. Taking two applications as examples, this thesis proposes and realizes the sMRI (structural Magnetic Resonance Imaging) classification diagnosis system and movie recommendation system based on Spark, which provides feasible schemes for the specific application of big data.The sMRI classification diagnosis system which is the first system proposed and realized in this thesis is mainly used to diagnose whether the detected patients have AD (Alzheimer's Disease).The system combines the data processing capability of Spark with the diagnosis technology of AD with sMRI. After building the brain sMRI imaging classification model of healthy people and AD patients by using Principal Component Analysis algorithm and Support Vector Machine algorithm on Spark platform, it can make a classification diagnosis result which can provide diagnosis support to the doctors.The movie recommendation system which is the second system proposed and realized in this thesis is mainly to recommend movies to users that they are interested in. The system builds a movie recommendation engine by using the collaborative filtering algorithm based on Alternating Least Squares on Spark platform, gets the optimal model by training the movies rating data rated by users, and then recommends movies to users by the optimal model. The system uses the stream processing ability of Spark Streaming to provide real-time recommendation, which provides feasible schemes for applications of recommendation engine on the Spark platform.This thesis tests the above two systems using real data sets and analysis of the experiment results is provided finally.The thesis researches on both the theory and the realization of the algorithms utilized in the systems. The classification algorithm and recommendation algorithm are realized based on MLlib, which are different from the traditional stand-alone machine learning algorithms and are suitable for distributed large-scale data processing.
Keywords/Search Tags:Spark, Machine Learning, sMRI, Classification, Movie Recommendation
PDF Full Text Request
Related items