Font Size: a A A

Data Mining In Internet Information Retrieval Applications

Posted on:2002-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z P XuFull Text:PDF
GTID:2208360185995608Subject:Computer applications
Abstract/Summary:PDF Full Text Request
As amount of information grows continuously on the Internet, it has become a great challenge to retrieve information from Internet. The emergence of search engines makes it possible for people to get information rapidly and effectively.In recent years, Chinese search engines have proliferated with the goal of people's needs for finding and accessing Chinese information on Internet. However, their effect is far from satisfying. There are still many problems in most of Chinese search engines such as the slow speed, the low recall and precison, lack of classification of web page, and etc. To solve these problems, we apply data mining technology to Internet information retrieval field to design and develop a search engine system——"Web Information Intelligent Search System". In the process of building Web Information Intelligent Search System, we have made a thorough study on the approaches of data mining, including Chinese phrase segmentation, user's interest model, classification of web page, clustering of web page, and etc.The following is my main work.1.By building two levels index for Chinese thesaurus, we attain a highly efficient Chinese phrase segmentation thesaurus which supports hashing operation by means of the first Chinese character in a string and full binary search. Based on this thesaurus, we design a new algorithm for Chinese phrase segmention, whose time complexity is superior to that of current algorithms for Chinese phrase segmentation.2.Based on the analysis of web constructure, we design a kind of user's interest model, and present the algorithm of the similarity between web page and user's interest model. In addition, we design and implement a kind of WWW information gathering algorithm based on the strategy of similarity of web page.3.After analyzing the structure information of web pages, we propose a new web page classifier based on classification tree of web page. Then, based on this classifier we desgin a kind of web page clasification algorithm.
Keywords/Search Tags:data mining, information gathering, information retrieval, classifier, clustering
PDF Full Text Request
Related items