Font Size: a A A

Research On Key Technologies Of Ontology-Based Deep Web Information Integration

Posted on:2010-06-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:W FangFull Text:PDF
GTID:1118360278478094Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the rapid development of Word Wide Web (WWW), Web especially Deep Web contains various kinds of huge high-valued information which is developing at an amazing speed now. Information hidden in Deep Web has such characteristics as heterogeneous, autonomous and dynamic, which decide that the methods of traditional information integration could not meet the requirements of modern people. In order to make it easier for the users to obtain the high-valued information rapidly and accurately, the research on Ontology-Based Deep Web Information integration has been an urgent problem pressed for solution for its broad application theoretical significance.In this thesis, the current research status and development trends of Deep Web information integration have been deeply analyzed. Based on the preliminary work of our research group, this dissertation puts forward an Ontology-Based Deep Web Information integration solution, which covers the dynamic fuzzy description logic method for Deep Web uncertain knowledge representation, the discovery technique based on maximum entropy and ontology of Deep Web sources, Deep Web data sources selection based on quality estimate model , the semantic annotation based on multiple data sources synchronous, Deep Web fuzzy ontology mapping and so on. The main research work and contributions of this dissertation are as follows.(1)An accurate and integrated ontology is a necessary precondition of Ontology-Based Deep Web Information integration, so we semi-automatically create the domain ontology of Deep Web in complicate with the characteristics of Deep Web. In addition, considering the uncertain problem of Deep Web ontology learning and ontology mapping, a dynamic fuzzy description logic (DFDLs) method based on uncertain knowledge representation is presented in order to overcome the deficiency of uncertain knowledge representation approaches used by the traditional description logic. (2) According to the dynamic and sparse distribution characteristics of Deep Web data sources, this dissertation brings forward a new method of detecting data sources based on maximum entropy classifier and domain ontology. This method firstly automatically identifies the Deep Web query interface through maximum entropy classifier, and then detects the data sources using a focused crawling technology based on domain ontology, which enables the focused crawler to focuse on visting those links which may access to entrance pages of Deep Web and avoid downloading some unnecessary pages in the whole process.(3) The efficiency and quality of Deep Web sources can be evaluated by the quality of services, so this paper proposes a quality estimation model of data source based on the domain of ontology, and applies it to the process of selecting the data sources. In this way, the model can select data source that best meets the users'exacting requirements, to achieve lower query cost and higher efficiency.(4) Considering the problem of interface schema and result schema missing in the process of information extraction, this paper provides a synchronous-annotation approach among multiple data sources, which can be realized by learning knowledge of domain ontology effectively from a set of interfaces and results schema of Deep Web and the case inquiry of ontology . This method is successfully applied to the data extract process of the complex result pages.(5) With regard to the problem of uncertain schema matching under the process of Ontology-Based Deep Web Information integration, this paper raises a new type of framework in which ontology mapping with uncertainty towards the uncertain schema matching. This framework integrates various ontology features, integrates several matching strategies and introduces the uncertain matching in each mapping strategy. This new approach is an efficient and general automatic mapping strategy for Ontology-Based Deep Web Information integration.(6) Based on the proposed key technologies and practice requirement, we propose Deep Web semantic integration architecture and implement a prototype system of Deep Web semantic integration. The system has functions such as sources discovery, sources selection, data extraction and semantic integration etc. Practical application shows that the system has certain practical value.This work is partially supported by Natural Science Foundation of China under grant No.60673092, the High-Technology Research Program of Jiangsu Province Under grant No. BG2005019, the Higher Education Graduate Research Innovation Program of Jiangsu Province in 2008 under grant No.cx08b-099z, and the Excellence Doctoral Dissertation Topic Selection Program of Soochow University in 2008 under grant No.SDY Zi [2008]22.
Keywords/Search Tags:Deep Web, Information Integration, Ontology, Knowledge Representation, Data Sources Discovery, Data Sources Selection, Information Extraction, Ontology Mapping
PDF Full Text Request
Related items