Font Size: a A A

Topic Detection And Topic Representation From Web

Posted on:2020-09-13Degree:MasterType:Thesis
Country:ChinaCandidate:A J HuFull Text:PDF
GTID:2428330623956627Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the recent boom of web technologies and social platforms,it is convenient for people to share data and express themselves.Due to this user generated information spread pattern,the unprecedented explosion in the volume of data has made it difficult for users to quickly access their interested contents.Therefore,establishing an accurate web topic detection model and finding an effective web topic representation method have become an effective way to help users get hot topic from massive data.In this paper,we proposed three approaches about topic from web: topic detection from web,topic representation from web and a post-processing optimization method for topic detection from web.Firstly,we proposed an algorithm about topic detection from web.In the process of web topic detection,we find that the similarities in a topic can reflect the relationship between intra-topic webpages.According with that,we construct the intra-topic similarity with sparse constraints.Then,Reconstructing the hybrid similarity graph based on Possion De-convolution,which reduces the adverse impact of false detection webpages and optimizes the importance of a topic.By sorting the importance of candidate topics,the task of topic detection from web is completed.We second propose a method of web topic representation.In order to solve the ambiguity caused by the incoherent keywords,and to comprehensively present the content of a web topic,this paper proposes a web topic representation method which is called as prototype learning.In term of interpreting a topic,the prototype should be representative and diverse.Based on those properties,a prototype webpage learning model is built.According to the intra-topic similarity,a set of webpage prototypes which are representative and diverse can be learned by the prototype learning model.Users can quickly,comprehensively and accurately understand the hot web topics by directly browsing the sets of prototype webpages.Finally,in order to improve the web topic detection,we propose a post-processing optimization method.We found that a lot of inaccurate or uninteresting topics by the rank list in detection-by-ranking approach.If these inaccurate or uninteresting topics could be handled reasonably,the performance of web topic detection can be improved naturally.Through a series of operations: absorbing,removing,refining and re-ranking on candidate topics,the inaccurate or uninteresting ones are effectively removed and at the same time the performance of web topic detection is improved.Experiments show significantly improved accuracy of the proposed post-processing method in comparison with the state-of-the-art methods on two public datasets.
Keywords/Search Tags:Web Topic, Detection, Representation, Prototype Learning, Post-Processing
PDF Full Text Request
Related items