Font Size: a A A

Web Page Topic Propagation Based On Neighbor Features

Posted on:2018-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:J J HanFull Text:PDF
GTID:2348330512998169Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
By study of neighbor features selection,representation and usage,this paper proposed the web page topic propagation algorithm based on neighbor feature algorithm.Based on the directed graph model constructed by web page hyperlinks,this paper realizes the topic propagation between adjacent web pages and finally obtains the probability distribution of the topics of the web pages.The main work of this paper is as follows:1)This paper collect Web pages on the Internet.After extracting the text of the page,removing duplicate pages,analyzing hyperlink and other data preprocessing,this paper construct a directed graph based on hyperlinks.2)This paper calculate the initial topic distribution for each web page node in the directed graph by the Latent Dirichlet Allocation.The resulting topic probability distribution can be as the feature representation of the page node,which reducing the dimension and be used to measure the semantic similarity between pages.3)By introducing the concepts of virtual nodes and virtual link,the features of neighbors of the page are added to the model.The virtual node is transformed from the parent pages and contains all anchor text information.Virtual link is able to introduce the topic inference into the model,which caused by the features of other neighbors.By the virtual node and virtual link,this paper realizes the effective representation and introduction of the neighbor features of the web page.4)In case of topic locality characteristics of Web pages,this paper proposed the web page topic propagation algorithm based on neighbor feature.Based on the directed graph model constructed by web page hyperlinks,this paper realizes the topic propagation between adjacent web pages and finally obtains the probability distribution of the topics of the web pages.Experiments show that the topic propagation algorithm based on neighbor feature has a significant improvement on the calculation of Web page topic distribution.
Keywords/Search Tags:Web classification, topic distribution, neighbor feature, topic propagation
PDF Full Text Request
Related items