Font Size: a A A

Research On Deep Web Sources Classification Leveraging World Knowledge Inference

Posted on:2010-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:L HuangFull Text:PDF
GTID:2178360275458670Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the rapid development of Internet,Web information capacity is expanding continuously,which provide huge information resource for users.Enormous Web information are deepening,and hidden behind query interfaces,which can't be obtained by traditional search engines,so they are called Deep Web.The increasing of Deep Web information with high-speed have being a significant resource for information retrieval. Due to the heterogeneity and dynamicity of Deep Web data,data integration of large-scale Deep Web are very challenging.And Deep Web sources classification is becoming more and more significant in large-scale Deep Web data integration.This thesis researches on key technologies of Deep Web sources classification in-depth,proposes a novel enhancing classification model based on knowledge model inference,which overcomes the limitations of traditional classification methods effectively.Our research issues are follows:(1) Research on disciplines of virtual features in structured Deep Web query interfaces,and propose a BOW selection method based on information gain and co-occurrence features.An effective feature selection is a vital precondition of features partition.(2) Analyze shortages of feature selection based on BOW,and propose a feature inference model based on knowledge inference,which could offset finite BOW sets.(3) Apply feature selection method in hierarchical knowledge repository based on latent semantic analysis,and construct an auxiliary classifier based on Wikipedia encyclopedia.(4) Propose an enhancing Deep Web sources classification model leveraging knowledge model inference,and apply the auxiliary classifier with plenty domain concepts to limited features classifying of Deep Web query interfaces,in order to realize feature semantic inference and augment domain concepts.Finally,experiments are performed on real UIUC Web repository dataset.The experimental results and analysis show that,our classification model is effective,which could provide higher classifying precision and application values.
Keywords/Search Tags:Deep Web, Data Integration, Sources Classification, Knowledge Model, Semantic Inference
PDF Full Text Request
Related items