Font Size: a A A

Research Of Methods Of Data Cleaning For Hotel Entity Based On Edit Distance And Conditional Functional Dependencies

Posted on:2015-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:J SuFull Text:PDF
GTID:2298330422978053Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
It is often the case that the same physical entities have multiple logicaldescriptions in the real world. It is especially true when these physical entities refersto the hotel data. In the view of the requirements of the hotel entities match, weanalyze and explore the data cleaning and identity match problems. We propose ourown data cleaning framework for the hotel data, which relies on the following relatedtechniques: Edit Distance, Master Data, Conditional Functional Dependencies and soon.Edit Distance is a widely used mark which can describe the string similarity.Master data is a repository of data which ‘matter most’. In another words, we can treatthem as facts. Based on them, two algorithms are proposed in this paper: The GeneralAlgorithm and The Incremental Algorithm respectively. They are used to remove thepotential duplicates for extracting the master data of hotel entities. Both of algorithmsdemonstrate an acceptable and reliable performance. Facing with the real scenarios ofdata resources updating, we improve and design the incremental algorithm based onthe general algorithm. It shows a higher efficiencies in the running time.On the other hand, conditional functional dependencies is a kind of integrityconstraints for relational databases. It is has been proved to be able to capture theinconsistencies and errors of data by enforcing bindings of semantically related values.By means of conditional functional dependencies extracting from the master data ofhotels, we could remove or repair the impurities existing in the data, thus, providing ahigher-quality and consistent view in the contexts of hotels.
Keywords/Search Tags:Edit Distance, Master Data, Conditional Functional Dependencies, Entities Match, Hotel Data
PDF Full Text Request
Related items