Font Size: a A A

Automatic Classification And Identification Of Deep Web Data Source Using Multi-classifier

Posted on:2010-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:Z T LiFull Text:PDF
GTID:2178360275959243Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The discovery and domain-specific topic of Deep Web data source draw more and more concerns and interest.Considering the lower precision and the domain-related lack when people identified the query interfaces,this paper proposes a method:Automatic Classification and Identification of Deep Web Data Source Using Multi-classifier.It mainly separates the process of discovering Deep Web data source into phases,processes each phase with more effective classifier.Meanwhile,this paper also provides the framework of the discovery of Deep Web data source and implements the process based on the data sets which partly are provided by TEL-8 Query Interfaces and partly are collected by manual craft.At last,evaluate the effect of the framework.The content contains:ⅰ.Learn and study the scale of the Deep Web at home and abroad,and point out the new process of the discovery of the Deep Web.ⅱ.Considering the pitfall of traditional search engine in discovering the relative resources about Deep Web,we design a form-focused crawler,and use the reinforcement learning to choose the prior hyperlinks;experiments have proved the improvement in efficiency and precision.ⅲ.Separate the process of discovering Deep Web data source into phases,and provide the framework.Considering the main task of each phase,we use effective classifier to process it to maximize the improvement of recall rate and precision in discovering the Deep Web source.ⅳ.According to the selection of the:features in searchable form classifier and domain-specific classifier,design a form extractor to parse the form structural and textual information and extract the features used in the different classifiers.According to the framework proposed in this paper,we use the data sets provided by TEL-8 query Interfaces and collected by manual craft to train the searchable form classifier and domain-specific form classifier,then use the form-focused crawler to crawl the web pages and evaluate the precision of the framework.Experiment results indicate the improvement in identifying the searchable forms and classifying the domain-specific forms.
Keywords/Search Tags:Deep Web, Web Form, Naive Bayers, Decision Tree
PDF Full Text Request
Related items