Font Size: a A A

Research On Identification Of The Same Semantic Objects In Heterogeneous Database Integrations

Posted on:2007-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:R ZhangFull Text:PDF
GTID:2178360185975301Subject:Agricultural mechanization project
Abstract/Summary:PDF Full Text Request
In the last few decades, with the great developments of information technology and the boost of informatization, the amount of data accumulated in human life have been stepped up greatly, the amount of data collection, storage,disposal and transmission have also increased steadily. If we realize data sharing, many more people could use data resource sufficiently, reduce rehanding and corresponding cost such as data acquisition, data gathering and so on. But, in the process of putting data sharing in practice, data contents, data formats and data qualities differ a lot. Sometimes even data formats can't be transformed or some information is lost after been tranformed due to data that users offered coming from different approach, which seriously impede data circulation and data sharing through various departments and software systems. Therefore, how to integratedly manage data efficiently is the natural choise of boosting up corporate competitiveness.With the development of global latticing and informatization, information on the internet becomes more numerous and the request for validity of information retrieval methods becomes higher and higher. The defects of internet begin to appear, such as search engines just can base on key words, the degree of intelligence is low, the result searched out is not the real necessary of users and so on. Tim Beerners-Lee, the founder of Internet, propounded the concept and architecture of sematic web in 2000.The sematic of data is the basic basis to judge data dependencies. Only getting the data dependencies is it possible to realize inter-cooperation. So, for the purpose of heterogeneous database integration, if we want to solve the dafects of inter-cooperation, one of the basic method is to describe the data in various databases with semetic, which build a sematic environment for data and provide the foundation for data handling,logical consequence and recycling in automation.Finding the corresponding semantic objects is the most important issue in heterogeneous database sematic integration damain. Exactly, the key task for semantic integration is to find the corresponding attributes, i.e. the attribute matching. Problem solving is of great significence for realizing inter-cooperation among databases and information multipurpose use.In this dissertation, the present heterogeneous database semantic integration techniques are analyzed; about semantic matching, we propound a mathod, which based on weights of attributes. Considering this method needs prior knowledge to ascertain the weight of every data index, then we don't put the weight to metadata directly by person, but try to use the method of machine learing and artificial intelligence to learn rule from data index. Neural network has special superiority for attribute matching. It is trained with material data, needn't programming with rules, need't prior knowledge, consider attribute information sufficiently and has generalizing and self-adapting abilities.The main contributions of the dissertation are summarized as follows:1) The present main techniques for heterogeneous database integration are surveyed, the...
Keywords/Search Tags:heterogeneous database integration, attribute matching, SOM model, CRC algorithm, SOM-LM algorithm
PDF Full Text Request
Related items