Font Size: a A A

Solving the data duplication problem for complex databases using neural networks

Posted on:2004-10-23Degree:Ph.DType:Dissertation
University:Florida Institute of TechnologyCandidate:Al-Namlah, Abdullah AbdulrahmanFull Text:PDF
GTID:1468390011458982Subject:Computer Science
Abstract/Summary:
Many of today's organizations require cooperating multiple databases such that one database can be integrated with another database system. The underlying problem with doing so is that databases are heterogeneous in terms of having different data representations, database schemas, and actual data. Data inconsistencies among databases can occur due to semantic inconsistencies in that syntactically the data is different, but semantically the same. A simple example of semantic equivalence is the misspelling of a last name, “Smith” and “Smyth”, where both data representations semantically are equivalent to one person.; A semantic heterogeneity within or across database systems poses a problem that results in data duplication such that the same record is stored multiple times in the same or different database systems. Data Duplication is a data quality problem that is extremely pervasive in legacy software systems. Data duplication means that a data source has multiple records, usually with different syntaxes for the same object. This problem has been recognized as extremely important to many organizations, due to the size and complexity of today's database systems. A number of researchers have tried to solve data duplication problem by using techniques such as Sorted Neighbor Method (SNM) and different clustering methods. However, a neural net approach has not been used in any of these studies.; Backpropagation neural networks have been successfully employed in a wide variety of fields, such as signal processing, pattern recognition, medicine, speech recognition, and business, in order to solve complex problems. In this research the backpropagation neural networks was applied to the data duplication problem. We show the results of using backpropagation to identify duplicate records in a large database. The use of backpropagation showed that it was a very efficient method for solving the data duplication problem, since it has many advantages over existing methods.
Keywords/Search Tags:Data, Neural, Using
Related items