Font Size: a A A

Research And Implementation Of Mining Based On Semantic Similarity Web Structure

Posted on:2010-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:R H YuanFull Text:PDF
GTID:2208360275998526Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
At present, the Internet/Web technology tends to maturate gradually. And Web has already become one of the most important information resources. However, going with providing profuse message, its characteristics which include the semi-structured data, the non-structurization, the data magnanimous, real-time dynamic, user polymorphism and so on, make the use of Web resource difficult to a certain extent. As a result, with the combination of data mining technologies and Web properties, in the vast information resources, searching for the message in need fast and precisely has become an urgent and meaningful research.In this paper, it introduces the contemporary classical algorithm of PageRank which reflects the relationship of the link structures of web pages, analyzing its basic idea thoroughly and pointing out its deficiency when Web page being valued. The main flaw of PageRank algorithm is distributing the PageRank value in all out-links equally. In fact, the importance of each link and correlation of the links are different. It completely neglects the semantic information of Web content, which leads to receive the influence of the irrelevant link easily, and then reduce user satisfaction of the search results.In response to these deficiencies, by introducing the semantic similarity based on< HowNet>, it makes connection between the qualities of link anchor text and the content of the page which it points to. With integrating the similarity information between the out-link and the goal page, it would distribute few PageRank value to those pages which have no value, or pages being non-correlated. And enhance the PageRank value of the pages which are real relevant to the subject. Thus, it will reflect the competition between the various links more accurately.Finally, the simulator modeling the search engine is realized. The simulation system has nearly contained the search engine function completely, and it has been tested under the real environment of the Internet, including the verifying of the PageRank algorithm which integrates with the semantic similarity. Through experiments and analysis, the new algorithm which does not affect the merits and efficiency of the original algorithm can value the page score better, and then improve customer satisfaction. It makes great strides forward to the artificial intelligence and semantic Web in the page priority algorithm.
Keywords/Search Tags:semantic similarity, , PageRank, search engine
PDF Full Text Request
Related items