Font Size: a A A

Deep Web Entrance Recognition And Personalized Search Research & Design

Posted on:2011-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:W ChenFull Text:PDF
GTID:2178360302493828Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The visits of users to Deep Web sites are mainly achieved through obtaining the desired results from the interfaces which have specific query ability provided in Web pages. It is necessary to provide a unified query interface which could make multiple Deep Web sites visited simultaneously to help users search Deep Web information simply and effectively. The recognition of the Deep Web entrance is an important component of the integrated search, the source of information searching and the prerequisite condition for the following works. And it is important for the entire integrated search system of Deep Web. Meanwhile, huge number of Deep Web information likes a vast ocean. For the sake of making the data obtained by integrated search of Deep Web have higher value and avoiding "Information Overloading", it needs to process the integrated search results and provide the intelligent services of personalized search for users.This paper mainly studies the techniques about the recognition of the Deep Web entrance and the display of the integrated results of Deep Web. In addition, a PU active learning algorithm which has incremental learning ability is proposed. We apply it into the recognition of the Deep Web entrance. Moreover, we put forward a personalized search method based on the integration of Deep Web. Finally, a personalized search prototype system based on the integration of Deep Web is designed and implemented.The main work of this paper is introduced as follows:(1) Study how to determine the entrances of Deep Web from the increased Web pages and classify them. For lowering the risk of lacking of initial positive samples and hardly obtaining negative samples of corresponding positive samples of different classes. A PU active learning method which has incremental learning ability is presented. This method employs three SVM classifiers in cooperative meta-supervised learning while unsupervised learning based on grid-based clustering is used. When the results of classification and cluster analysis are not unanimous, we introduce active learning to mark the unlabeled samples. The algorithm is applied to the online recognition of Deep Web interfaces and classification. Experiments show that the method can effectively improve the ability of identifying new classes and processing incremental unlabeled samples.(2) Present a personalized search approach based on the integration of Deep Web in order to solve the problem that information overloading due to the excessive amount of information in the integrated search of Deep Web. This method uses Deep Web directories and user questionnaire to generate interest tree and update user interest according to the feedback from users and the returned parameters from the members of the Deep Web sites. The pages are filtered and sorted according to different user interests so as to get the final displayed pages. Experimental results demonstrate that this method effectively optimizes the integrated search process of Deep Web, leading to the more prominent personalized information.(3) Design and implement an integrated personalized search prototype system of Deep Web. Moreover, we analyze the application of the techniques mentioned above to the system. The practical application shows that the system can has a good effect.
Keywords/Search Tags:Deep Web, Active Learning, PU learning, personalized search, interest tree
PDF Full Text Request
Related items