Font Size: a A A

The Optimization Design Research Of Mass Data Processing Architecture Oriented To The Commercial Public Opinion Analysis

Posted on:2017-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:K WangFull Text:PDF
GTID:2348330485987960Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As the time for big data is coming, human society is moving from IT to DT times.The value the accumulated data brings has far beyond our imagination. However, it also brings some negative things such as loss of privacy and network violence. How to make use of these data better and then do more meaningful things which tests every worker who engages in the analysis of big data related.In the business information sector, many content work which is based on Hadoop computing framework, uses text analysis to mine the commercial public opinion and test and judge public opinion topic has been done by specialists in terms of the commercial public opinion analysis. However, in the aspect of real-time processing in public opinion, dynamic tracing and trend forecasting, there has a large gap compared with developed countries. It also become key research direction of public opinion information development.First, on the basis of analyzing the mainstream public opinion analysis system,this paper sorted the key requirements of the platform system which includes the overall requirements, the functional requirements and non-functional requirements,and optimizing processed the requirements on the commercial public opinion field.Second, combined with each module's key requirements, we put forward main technical framework suited our country's public opinion analysis. From the data acquisition module and data pre-processing to the business analysis and every aspect of the upper application, we have done a detailed analysis and research with the specific technical framework of every module and the underlying framework store. In addition, we have also proposed improved algorithm on the text classification and cache optimization of Spark.Finally, we structure the cluster of the local environment according to the above architecture design, crawl the web site content combined with open micro-blogging API and web crawler technology of Web Magic, do pre-processing of the text data using the NLPIR segmentation tool, and use the K-means algorithm topic clustering to do differentiation of public opinion sentiment analysis specific to different types of text. We analyze the simulation and the test result of the optimization algorithms mentioned above.In this thesis, the data processing mainly depends on the Hadoop computing framework, but in real time, we use the open source framework Spark based on memory calculation to do stream data processing which is the current mainstream of a large data processing model. This model can draw on each other's strength with Map Reduce and has good tolerance. Therefore, combining the Hadoop and Spark to develop and design the commercial public opinion analysis platform is indeed a good choice.
Keywords/Search Tags:Commercial public opinion, Framework, Hadoop, Stream Data Processing, Text Clustering
PDF Full Text Request
Related items