Font Size: a A A

Research And Application On Internet Public Opinion Event Mining

Posted on:2017-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:L K ZhaoFull Text:PDF
GTID:2308330503468504Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of the Internet, the rapid growth of network information,leading people to obtain effective information more difficult. Especially for the large amount of information on the individual, enterprise and public institutions, relying on manpower alone to integrate and analyze the information on the internet almost become an impossible task.Internet hotspot event mining and analysis technology can effectively solve the above problems by mitigating the information overload, integrating redundant information and refining the core information. The research of the Internet hotspot event mining is mainly composed of the topic discovery, event topic generation, event feature analysis and so on. Topic detection technology research started earlier, there are many achievements. In contrast, the event topic sentence generation algorithm in the domestic research is less, most of the domestic research on the topic of hot events are limited to the topic discovery, but not to process and analyze the topics, generate a more representative and characteristic topics manifestation- event theme. In this paper, the following issues are studied in the field of hot event mining and analysis:First, a hybrid candidate set construction algorithm based on topic core words and event ordering is proposed. The algorithm uses PAT-Tree to extract the high frequency core words in the topic text, and map the high frequency words to the sentence and generate a part of the core sentence. Another part of the event extract sentence containing the event elements as the core sentence from the subject text by doing triples as candidate elements.Second, an improved MSC model based on word graph is proposed, and the model is used to extract the event topic sentences. MSC model based on word graph has a good performance in dealing with English text and Spanish texts. In this paper, the model was improved for the Chinese hot topic title generation, and achieved good results. The model transforms the sentences in candidate set into the form of word graph. The word graph is a directed weighted acyclic graph. The nodes in the graph represent words and the edges represent the connections between words. Each path represents a possible sentence in the word graph. Design point weights and edge weights scoring formula. The more contribution points and edges make to information and language coherent, the higher the score will be. Finally, using the cluster search algorithm to calculate the scores of the word graph paths, and the path with the highest score is regard as the final event theme sentence. We calculated the optimal cluster width value by the amount of data in the current experiment, and compare with the other two kinds of traditional multi document title generation algorithm to prove that the algorithm has good performance in two aspects of information quantity and language coherence.Third, based on the above study, we realized the hot events analysis system. The system analyzes the existing topic data, calculates the topic sentence by the topic text, displays topics in different dimensions and achieves data visualization. Visualization includes event heat trends, events emotional polarity trends, events source distribution, and so on. The system also implements customized for individualized functional entities and events.
Keywords/Search Tags:Hot Event Analysis, Chinese Information Processing, Headline Generation, Patricia Tree
PDF Full Text Request
Related items