Font Size: a A A

Research On Topic Extraction Method For Internet Public Opinion

Posted on:2019-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y S ZhuFull Text:PDF
GTID:2348330569987671Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Internet public opinion is a common opinion that people have certain influence and inclination on some social phenomenon or social problem on the Internet.In recent years,the influence of Internet public opinion on political order and social stability has been huge,and it is highly valued by the government and enterprises,which is of great significance to its research.At the same time,with the rapid development of information technology and the Internet,the information on the Internet is being generated at an explosive rate.In order to help people get the information needed quickly and accurately from such a huge amount of public opinion information,the topic extraction as an effective method has been widely applied in the field of natural language processing.The topic extraction is to extract important information from the document to represent the central idea of the document,which is,according to the form of the extracted information,it's divided into key words and autosummaries.Keyword extraction is a word or phrase that is extracted from the document to reflect the content of the subject.The key word is the smallest unit that expresses the subject of the document.Traditional keywords extract statistical characteristics of statistical words without considering the impact of document topics.Automatic summarization refers to a concise and coherent sentence or short passage that accurately and comprehensively reflects the content of the document center,which can meet the requirements of information acquisition better than the key words.The traditional automatic summary method only calculates the importance of the sentence of the document,ignoring the diversity of the document itself.In addition,the quality of the extracted keywords will also affect the quality of the automatic abstracts.In view of the shortcomings of the current keyword extraction technology and automatic summarization technology,this paper carries on the in-depth study and discussion,including:1.Introduces the background of topic extraction method;The author separately introduce related topic extraction technology such as Chinese word segmentation,sentence similarity,etc;The author analyzes the topic model,and pave the way for the next step of the improved method;2.Analysis of traditional keywords extraction methods and their respective advantages and disadvantages.In this paper,a keywords extraction improvement method based on co-occurrence model and textrank is proposed to explore the co-occurrence relationship between the words.Study the evaluation method of keywords extraction;And design the experiment to compare the keywords extraction improvement method based on co-occurrence model and textrank with the traditional keyword extraction method and to verify the improvement of the method in terms of keyword extraction;3.Analysis of traditional automatic summarization methods and their respective advantages and disadvantages.In this paper,a automatic summarization improvement method based on latent dirichlet allocation and maximal marginal relevance is proposed to reflect the similarity between sentences and remove redundant sentences.Study the evaluation method of automatic summarization;And design the experiment to compare the automatic summarization improvement method based on latent dirichlet allocation and maximal marginal relevance with the traditional automatic summarization method and to verify the improvement of the method in terms of automatic summarization.
Keywords/Search Tags:Topic Extraction, Keyword Extraction, Automatic Summarization, Latent Dirichlet Allocation, Maximal Marginal Relevance
PDF Full Text Request
Related items