Font Size: a A A

Research On Public Opinion Ontology Concept Extraction Based On Short Text

Posted on:2019-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhaFull Text:PDF
GTID:2428330566467004Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,the Internet has produced a variety of formats of mass data(words,pictures,sound,video,etc.).Mass data contains relevant public opinion information,making the Internet information and knowledge become an important source of public opinion.How to extract public opinion ontology from language materials faster and more accurately It has become a hot spot in public opinion research.The past public opinion corpus often collected from the news,the standard of the news format,the specific characters,the time,the place of occurrence,the occurrence process,the result and so on,and the news information is usually long text.With the introduction of various social networking tools,massive short text data have been generated.Short text is different from long text.It has two unique processing characteristics: real-time and sparsity.The short text on the Internet is updated in real time,refreshing fast and difficult to collect.It requires a higher efficiency for the classification of short text information.The length of short text is within 200 words,usually only a few sentences,so the effective information is very few,and the features of the sample are very sparse,and the dimension of the feature set is very high.High,it is difficult to extract accurate sample features.Aiming at the long tail phenomenon in statistical word frequency,data smoothing technology is used to adjust word frequency to accomplish tasks.Based on the characteristics of word frequency combined with feature words,document feature words are extracted.In order to effectively improve the computational efficiency,this paper uses set intersection characteristics to calculate and calculate text correlation by comparing set correlation numbers.The noun words or phrases are extracted as candidate concept sets for the subject text after recognition;the similarity degree between the candidate concepts is evaluated according to the semantic similarity method and the weights of the concepts are sorted;thus the core concepts related to the subject are extracted.The experimental results show that the precision ratio of the concept extraction of public opinion ontology for short text is 0.62% higher than that of the TFIDF method,the recall rate is 0.4%,and the average consumption time is 30% less than that of the TFIDF method.It has made a useful exploration of ontology concept extraction from short texts.
Keywords/Search Tags:Public opinion ontology, Concept extraction, Short Text, Word similarity, Word frequency statistics, Set
PDF Full Text Request
Related items