Research And Implementation Of Chinese Text Analysis System Based On Big Data Platform

Posted on:2018-11-15

Degree:Master

Type:Thesis

Country:China

Candidate:J L Yuan

Full Text:PDF

GTID:2348330518496551

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In the 21st Century, with the rapid development of Internet technology, the Internet has is closely related to all walks of life, forming a pattern of Internet plus.At the same time, a variety of devices connected to the Internet generate data continuously, resulting in a large explosion of data, which laid a foundation for the arrival of the era of big data. This includes a large amount of text information that are presented in the Internet in the form of log, comments, articles and other forms on. Because the Internet and people’s lives are getting closer and closer, and the impact of online public opinion on social hot spots is also growing, how to analyze the network view, forecast the network sentiment and correctly guide the network public opinion becomes the problem that needs to be solved urgently in the society and the world. However, the text analysis method at present is to use text modeling of statistical language model, combined with machine learning algorithm to train the model, the effect depends on data quality, and model training needs a lot of computing time, but it is lack of feasible solutions in parallel algorithm research. To this end, this paper based on neural network language model, and combined with Spark big data platform, the design and implementation of a comprehensive system of Chinese text analysis and processing.The main work of this paper includes: (1) research the text orientation analysis algorithm based on neural network language model, and design the text feature representation algorithm fused Doc2vec model with LDA model. (2) research on how to parallel the algorithms which are related with this system, and design the parallel model use the Spark platform. (3) research the general flow of Chinese text orientation analysis system, design and implement the Chinese text analysis system based on big data platform, including the data intake, corpus tagging,corpus storage, model training, model validation etc.(4) verify and test the prototype system, and give the test results.In order to verify the feasibility of this project, this paper said the algorithm accuracy test of Doc2vec text feature fusion LDA model design of the prototype system, the experimental results show that after the fusion of the text representation model,it has a very high degree of recognition, the ROC curve of AUC value reached 0.95. At the same time, this paper makes a parallel test on the text analysis correlation algorithm. The test results show that the parallel algorithm can greatly improve the efficiency of the system.

Keywords/Search Tags:

Spark, text orientation analysis, Doc2vec, LDA, neural network, machine learning

PDF Full Text Request

Related items

1	Research On Text Sentiment Analysis Via Spark And Machine Learning
2	Text Orientation Analysis System Based On Neural Network In The Research And Implementation
3	Design And Implementation Of Text Classifier Based On Neural Network With Spark
4	Research And Design Of Sentiment Analysis System For Intelligent Customer Service Based On Doc2vec And Deep Neural Network
5	Research On Chinese-text Sentiment Analysis Based On Spark
6	Research On Text Sentiment Analysis Based On Doc2vec And Deep Learning
7	Research On Text Analysis Of Current Political News Based On Machine Learning
8	Research On Text Sentiment Analysis For Wechat Public Platform
9	Text Clustering And Its Application In Text Orientation Analysis
10	Analysis And Research Of Machine Learning Model Based On Spark