Font Size: a A A

Oneof Text Clustering Algorithm Based On Big Data

Posted on:2017-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:H L CuiFull Text:PDF
GTID:2308330485962395Subject:Information Computing and Intelligent Systems
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of the Internet, Internet of things and cloud computing, the data in every field of industry and commerce has shown explosive growth. As a result big data has become a worldwide hot topic, which is widely concerned by academia, industry and government. Big data contain huge value, which is different from the traditional structured data. Digging out the useful information needs for a new algorithm framework and processing system.This paper mainly studies the clustering algorithm of massive text datain terms ofthe data mining and analysis of social networking sites. Spectral clustering is one of the most popular clustering algorithms, and it has a wide range of applications. However, the running time of the spectral clustering algorithm is three times the size of the input data, which makes it not be applied to large scale text data set. In recent years, some people are trying to overcome these problems, however, there is not a satisfactory solution. This paper makes a deep study of the related theory and realization, including big data’s concept, characteristic, value and challenges, big data processing platform, Hadoop referring to HDFS and Map Reduce framework and concept, characteristics and development of mahout, spectral clustering algorithm, K-means algorithm and parallel spectral clustering algorithm. This paper builds the Hadoop cluster to varify the algorithm on it. This paper alsoproposes a new spectral clustering algorithm to improve the parallel implementation of the spectral clustering algorithm in Mahout. This algorithm can effectively implement Map Redude parallelization. By examining its effectiveness of clustering the large amounts of text data, the results show that the algorithm is effective.
Keywords/Search Tags:Big data, Spectral clustering, K-means, Map Reduce, Mahout, Hadoop
PDF Full Text Request
Related items