Font Size: a A A

The Study Of Discovering Micro Blog Users Of Special Topic Based On Relevant Context Graph

Posted on:2015-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:Q HuFull Text:PDF
GTID:2298330431497447Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As the number of users grow rapidly, the Internet application goes further, therequirement of personalize information continually grows, general search engines can’tsatisfy the requirements of special users. As a novel technology, topic crawling searchtechnology was born at the right moment, the technology has a faster update cycle, costs lessresource, and the most important is that the technology is able to meet the personalized needsof users.Relevancy Context Graph (RCG) gathers context knowledge of given topic. Accordingto the context knowledge, each web page will be assigned a priority accessing value, whichwill guide the crawling direction of crawler. But, in the constructing process of contextknowledge of RCG, RCG didn’t sufficiently extract the information of link structure betweenweb pages, and didn’t consider the semantic relationships between web pages. So, manynoise web pages have not been effectively filtered out. Moreover, there are some defects inthe definition of topic-specific feature words.In this paper, the contributions are summarized.(1) I optimize RCG, and use the contextknowledge of RCG into the process of collecting topic relevant users in micro-blog. In theoptimizing process of RCG, I take the idea of friend link prediction of social network intoconsideration, redesign the structure of RCG, consider much more link relations between webpages, and take the advantage of Vector Space Model (VSM) or Semantic Similarity VectorSpace Model (SSVSM) to filter noise web pages out, which ensure the effectiveness of RCG.(2) I extend topic-specific feature words based on semantic relationships of words appearedin context graph, and effectively compute the distribution of the extended topic-specificfeature words.(3) A crawler is constructed using the Optimized RCG (ORCG), which isoptimized based on link analysis and semantic analysis.(4) Based on the large number ofweb pages including every field, a General Language Model (GLM) is constructed.(5) Icombine the distribution of the extended topic-specific feature words, GML and the searchtechnology of disseminator in micro-blog. Then, I define the compute method of topicdisseminator. The topic disseminator will guide web crawler working in better direction togather topic relevant users of micro-blog.In this paper, each corresponding experiment is performed in each stage of our research.Those experimental results proved that our method is effective to some extent.
Keywords/Search Tags:Focused Crawler, RCG, Link Prediction, semantic analysis, Micro Blog
PDF Full Text Request
Related items