Font Size: a A A

Study On Deep Web Source Classification And User Satisfaction Assessment

Posted on:2015-09-13Degree:MasterType:Thesis
Country:ChinaCandidate:Q W ZhouFull Text:PDF
GTID:2298330422471675Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Deep Web is the part of the Web, which cannot be indexed by Traditional searchengine. Compared with ordinary static pages, the information contained in Deep Webhas great advantage on quantity, quality and thematic area. It appears a sharp increase inthe number of the Deep Web Sources with the rapid development of the Internet. DeepWeb has become an important way for people to obtain information. The research ofLarge-scale information integration on Deep Web has important practical significance.Deep Web sources classification and user satisfaction assessment are the two keysteps for the research of Deep Web information integration. Classification algorithmwhich is the first step of Deep Web Information Integration, classifies the entire DeepWeb sources based on the domain, and thus can help to find the Deep Web sourcesquickly and accurately. Satisfaction is the feedback made by user on the system of DeepWeb information integration, which not only verifies the effect of Deep Webinformation integration, but also can help us identify deficiencies in previous work andimprove it. In this paper, we studied and discussed the issues about the two aspects ofabove. We proposed appropriate methods and made a full experiment. The main work isas follows:We introduced the framework of Deep Web information integration and discussedthe development of the Deep Web source classification and user satisfaction assessmentin detail. We studied the conventional classification algorithm on Deep Web source andthe techniques about user satisfaction evaluation.We found that the basic KNN algorithm is too time consuming. Inspired by thevector space model, we consider the query interface of Deep Web source as vector space.The traditional methods of similarity calculation are mainly Cosine calculation andEuclid calculation. However, because of the difference between attributes contained byDeep Web sources, the vectors mapped by Deep Web sources are not directly involvedin the calculation. We put forward our similarity calculation through the redesign of thevector. Furthermore, we proposed VD-KNN Deep Web source classification algorithm.In order to reduce the complexity of classification algorithm about Deep Web sourcedeeply, we established the attribute weight data model combined with the basic threecharacteristics of the properties and proposed the corresponding similarity calculationmethods and classification model (Attribute Decentralization Algorithm-based Deep Web Sources Classification, AD-DWSC).Customer is God. From the user’s point of view-Allow users to directly evaluate searchengines. In this paper, we extracted features about user behavior from logs. Byanalyzing the data about user behavior, we proposed the corresponding assumptions andconventions. Finally, we established the automatic session parsing algorithm-basedsearch engine performance evaluation algorithm (Automatic Session ParsingAlgorithm-based Search Engine Performance Evaluation, ASP-SEPE).Finally, we did lots of experiments and simulations for VD-KNN, AD-DWSC andASP-SEPE proposed in this paper. We made a thorough analysis of experimental resultsand the results showed that these three algorithms reached the desired results.
Keywords/Search Tags:Deep Web information integration, data source classification, attributesdecentralization, satisfaction assessment, session parsing
PDF Full Text Request
Related items