Font Size: a A A

The Research On Web Searching And Commending For The Topic-specific Search Engine

Posted on:2008-11-24Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2178360215997667Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Web page retrieval and recommendation are the important components of the network information processing in search engine, voluntarily finding and extracting the information which the user interests from the web pages, having the vital role of constructing the topic-specific search engine. This dissertation studies web page retrieval and recommendation for the topic-specific search engine. The main work of this dissertation are summarized as follows:1. A frame design of the topic-based search engine based on meta-search system (MBTSE) is proposed, using the thought that unify the both, after analyzing the basic principle of traditional search engine and the key technologies and comparing the current search engine with the topic-specific search engine.2. A idea of retrieving pages based on text filter using a fast neighbor search algorithm is proposed, actualized and validated the feasibility, after analyzing the text filtering and common page retrieval models, which overcome the large calculation in the actual retrieval a certain extent.3. In order to manipulate the classification of web pages according to their content in MBTSE, a new ameliorative measurement for the similarity between two pages is proposed, analyzed, and validated in detail. Then, the contrastive experiments validate that the new similar measurement improve the average accuracy, which use two kinds of K-nearest neighbor decision-making.4. According to the specific structure of HTML pages, a series of preprocess is manipulated on the web page including web page parsing, stop words filtering, stemming, feature extracting and eigenvector space representing using words and ngrams for web pages.5. In order to automatically and forwardly find new discoverer on some topic, content recommendation in topic-specific search engine using outlier mining based on LOF is proposed, which reflect intelligent search engine. At the same time, a outlier factor top-n% method being calculated probability is proposed, which is from the use's angle. Experimental results using planted motifs shows the proposed method from top-n to top-n% is capable of detecting web content outliers in web data.
Keywords/Search Tags:web page retrieval, search engine, text classification, similarity, outlier mining
PDF Full Text Request
Related items