Font Size: a A A

Automatic Text Summarization And Fuzzy Topics Identification Methods Towards The Design Of Public Opinion Information System

Posted on:2020-11-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:C J FangFull Text:PDF
GTID:1488306740472894Subject:Network and information security
Abstract/Summary:PDF Full Text Request
Nowadays,we are witnessing the explosive growth of netizens as well as the original content posted by social users,which is a result of the rise of Web 2.0 technology and the wide application of new online media(such as blogs,social platforms and mobile platforms).This not only makes government,public institutions and enterprises rush to handle the huge information generated every day,but also brings both opportunities and challenges to network public opinion system,which continuously gathers network information from websites,blogs,and forums,analyzes public opinions and provides timely warning of potential risks.In the ear of big data,we are often faced with the trouble caused by information overload and information noise,rather than the lack of information.Consequently,researchers are turning their sights from extensively collect information to accurately collect useful information.It is widely acknowledged that social media is a double-edged sword.On the one hand,it greatly enriches people's spiritual life and creates impressive economic value and social benefits.On the other hand,it also provides a platform for the spread of negative information,such as reactionary,pornographic content and rumors.Therefore,network public opinion has become one of the important factors that might exert considerable influence on social stability,harmony,security as well as sustainable development.Hence,timely understand and track public opinion and social evaluation is of great importance to government and enterprises in propaganda,decision-making,development and crisis management.Based on the actual demands of public opinion consultation system,we conduct theoretical research on two major issues: automatic text summary extraction and public opinion topic extraction.Furthermore,a public opinion analysis and consultation system is constructed,and the application related to enterprise public opinion analysis is realized on the basis of maximal marginal relevance.Specific research content and contributions of this paper are as follows:1.We design a general framework for large text data analysis,and realize an enterprise public opinion analysis method based on maximal marginal relevance.Specifically,this paper first proposes an analysis framework for text data,which can effectively integrate text mining algorithms into it and realize the integration of public opinion text data acquisition,storage,mining algorithms and application visualization.In addition,considering that the information disclosed by an enterprise is usually difficult to judge whether it is trustworthy,and also,it is difiicult to measure therelation between public information and the content claimed by the enterprise.To cope this issue,we present an algorithm for estimating the credibility of the content claimed by an enterprise.With the aid of Internet big data analysis technique,it attempts to find the evidence for supporting market share of candidate companies from the data on the Internet,aiming at offering decision support for enterprise recommendation.Meanwhile,we utilize word embedding and KL-Divergence to filter the retrieved public sentiment information,and present a Maximal Marginal Relevance based method to compute the confidence score of each company.Extensive experiments demonstrate that the proposed method is able to find positive,neutral and negative evidence for the sales ranking over 100 enterprises.In addition,it also confirms that sales ranking claimed by most of the enterprises is trustworthy.2.In this paper,we propose a word-sentence co-ranking method based on graphic model(Co Rank for short)for calculating the weights of the sentences appeared in a document and then automatically extracting sentences with high weights so as to produce a short summary of the document.To be specific,this paper proposes an approach to automatically generate a ranking score for each sentence for the purpose of extracting core sentences,namely the so-called word-sentence co-ranking method based on graphic model(Co Rank).It can combine word-sentence relations and graph-based unsupervised sorting models to calculate the weights of the sentences in a document,and then extract high weight sentences to produce a short summary of the document.In addition,since a topic is usually explained by multiple sentences in a real world document,these sentences tend to contain several identical keywords,resulting in similar sentence weights.To combat this challenge,we introduce a redundancy removal process to Co Rank algorithm and propose the so-called Co Rank+ algorithm.The similarity judgment is introduced when extracting summary sentences,so that synonyms or synonymous sentences are eliminated,which ensures that the abstract can express the original purpose more accurately.Furthermore,we also provide a theoretical analysis on the convergence of the algorithm.Finally,experimental results on two real-life datasets validate that Co Rank+ can generate high-quality automatic abstracts for both Chinese and English language texts.3.We try to reconstruct the semantic association between words and words,words and texts,texts and texts based on word graphs,and propose the word graph fuzzy clustering method based on the fastest projection gradient.In particular,this paper firstly draws on the construction of word graph model to realize the construction of word graph,and attempts to reconstruct the semantic association between words and words,words and texts,texts and texts based on word graphs.Secondly,we propose the word graph fuzzy clustering method on the foundation of the word graphs formed earlier,and realize the word graph fuzzy clustering by means of the fastest projection gradient strategy.Besides,we prove the convergence of the word graph fuzzy clustering,and thus offer a possible solution to the multi-topic classification task on public sentiment.Finally,the experiments are mainly performed on artificial and real data sets,which show that the proposed word graph fuzzy clustering algorithm can find a satisfactory answer to the question “What does the theme mainly express?”.In addition,we also verify that word graph fuzzy clustering is better at finding out text categories with semantic relevance on real Pokec and Weibo datasets.
Keywords/Search Tags:public opinion consultation Systems, text analysis with big data, automatic textractive text summarization, topic fuzzy clustering, maximal marginal relevance based ranking method
PDF Full Text Request
Related items