Font Size: a A A

Value Algorithm Based On The Content And Links Page Study

Posted on:2011-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:X H ZhouFull Text:PDF
GTID:2208330332977008Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With information proliferation on the web as well as popularity of Internet, how to locate related information as well as providing accordingly information interpretation has created big challenges for research in the fields of data engineering, IR as well as data mining due to features of Web (huge volume, heterogeneous, dynamic and semi-structured etc.).While web search engine can retrieve information on the Web for a specific topic, users have to step a long ordered list in order to locate the valuable information, which is often tedious and less efficient due to various reasons like huge volume of information.The search engines are based on one of the two methods, the content of the pages and the link structure. The first kind of search engineers works well for traditional documents, but the performance drops significant when applied to the web pages. The main reason is that there is too much irrelevant information contained in a web page. The second one takes the hyperlink structures of web pages into account in order to improve the performance. The examples are Pagerank and HITS. They are applied to Google and the CLEVER project respectively.The works in this paper include three parts. The first is constructing a new architecture of Personalized Information Retrieval. The second is drawing the Content and Link Based Fast Ranking Algorithm of computing the value of page based on the content and the link, and the third is drawing the Content and Link Based Complete Ranking Algorithm of computing the page value.The tests are set up for both of the algorithms. The test data-set is the subset of WT10g. The results show the new algorithms are better than the traditional method of retrieval based on link and approach to the TFIDF, the retrieval method based on the content.The last part of this paper points out the rooms of improvement and the future works.
Keywords/Search Tags:Personalized, Information Retrieval, Class keywords, Link Matrix, Page Value
PDF Full Text Request
Related items