Font Size: a A A

Research On Subject-oriented Extraction Of Public Opinion Ontology Concepts And Relations

Posted on:2017-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:X F ZhangFull Text:PDF
GTID:2348330503984350Subject:Engineering, software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the network, public opinion will be centered on a certain or some form of theme to spread unceasingly, but the spreading point is in constant change, so to quickly and efficiently find the focus of the migration related content, can help to related department to better grasp and analysis of the public opinion guidance, and these public opinion stored in a certain form, it is called public opinion ontology, that can also support public opinion analysis of late.Currently the majority studies of theme identification information will be the news pages as a corpus, because the news corpus are format specification, including a specific figure, the event time, place, what is going on these elements. While, public opinion information in various forms such as blogs, post bar is spreading quickly. When building ontology-based public opinion, the most methods use domain correlation and domain consistency or their improved methods to extract domain ontology concept, First, the method to extract the ontology concepts is extremely territorial and corpus are static, standardized. But the corpuses of public opinion information are cross-cutting, constantly changing. Obviously, this method of public opinion ontology construction has some limitations. Second, based on the theme of public opinion corpus as training corpus, this method extracts high- frequency words which only related to the theme, and filter frequency words. And the concept can only belong to a topic. But public opinion information is interdisciplinary, so a word may appear in more than one subject.Considering these problems, this article crawling technology is used to collect public opinion hotspot text, and extract the key words of document based on some characteristics of the time attribute, and establish a space vector, effectively find content focus shifting network text but belong to the same subject and normalized class. We will extract the nominal words or phrases as the concept of candidate set; we use the semantic similarity method provided to calculate the correlation between candidate concepts, to calculate the weights of the concept and sorting; we combine the word frequency statistics method to extract the core concepts related to the topic. The experimental results show that this method can effectively extract the core concepts related to the subject matter and the public opinion, and play a positive role in building the public opinion ontology, as well as knowledge sharing and reusing in the late. Finally, according to the existing correlation between concepts, we directly determine relationships between concepts. If there is a direct correlation between concepts, there concepts have a certain relationship, and we use the combination of different methods to determine the relationship and division of a class or name. The experimental results show that this method can effectively extract the core concepts related to the subject matter and the public opinion, and the concepts of the relationship, as well as play a positive role in building the public opinion ontology, as well as knowledge sharing and reusing in the late.
Keywords/Search Tags:Public opinion ontology, Concept extraction, Word similarity, Word frequency statistics, Relation extraction
PDF Full Text Request
Related items