| In the 21st Century, with the rapid development of Internet technology, the Internet has is closely related to all walks of life, forming a pattern of Internet plus.At the same time, a variety of devices connected to the Internet generate data continuously, resulting in a large explosion of data, which laid a foundation for the arrival of the era of big data. This includes a large amount of text information that are presented in the Internet in the form of log, comments, articles and other forms on. Because the Internet and people’s lives are getting closer and closer, and the impact of online public opinion on social hot spots is also growing, how to analyze the network view, forecast the network sentiment and correctly guide the network public opinion becomes the problem that needs to be solved urgently in the society and the world. However, the text analysis method at present is to use text modeling of statistical language model, combined with machine learning algorithm to train the model, the effect depends on data quality, and model training needs a lot of computing time, but it is lack of feasible solutions in parallel algorithm research. To this end, this paper based on neural network language model, and combined with Spark big data platform, the design and implementation of a comprehensive system of Chinese text analysis and processing.The main work of this paper includes: (1) research the text orientation analysis algorithm based on neural network language model, and design the text feature representation algorithm fused Doc2vec model with LDA model. (2) research on how to parallel the algorithms which are related with this system, and design the parallel model use the Spark platform. (3) research the general flow of Chinese text orientation analysis system, design and implement the Chinese text analysis system based on big data platform, including the data intake, corpus tagging,corpus storage, model training, model validation etc.(4) verify and test the prototype system, and give the test results.In order to verify the feasibility of this project, this paper said the algorithm accuracy test of Doc2vec text feature fusion LDA model design of the prototype system, the experimental results show that after the fusion of the text representation model,it has a very high degree of recognition, the ROC curve of AUC value reached 0.95. At the same time, this paper makes a parallel test on the text analysis correlation algorithm. The test results show that the parallel algorithm can greatly improve the efficiency of the system. |