Font Size: a A A

Researches On Methods Of Entity Matching And Its Applications In Vector Spatial Data

Posted on:2009-04-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:J H WuFull Text:PDF
GTID:1220330503976053Subject:Photogrammetry and Remote Sensing
Abstract/Summary:PDF Full Text Request
Along with continuous development and widespread application of geographic information system technology, people’s demand for spatial data is continuously increasing, the requirement of present situation and quality of spatial data become more and more high. In order to meet this demand, many departments gathered massive spatial data, and established spatial database for different application purposes. In order to maintain the present situation of spatial data, it is need to termly partially update the vector spatial database, then bases on entities matching technology to analyze the entities’ changes in vector spatial database and carry out the corresponding updating operation. but spatial datasets describing the same area were often gathered repeatedly by many departments, because of the differences at the aspects of geometry position, geometry shape, topological structure, geometry precision, attribute detailed degree, code scheme, semantic expression, data integration and sharing become extremely difficult. In order to effectively use these discrepant data, decrease the data acquisition expense, speed up data updating, improve the data quality and so on, we often need to integrate and fuse spatial data of different scale across multiple departments, fields, regions, and time stages, to obtain high quality spatial data which has advantages of higher precision, richer attribute information, broader map extent. In this case, we often need to identify identical entites in different map databases and establish their mapping relations, then conduct vector data fusion on basis of entity matching, to solve the inconsistency issue at the aspects of geometry and semantics in spatial data.This article studied on methods of entity matching thoroughly and systematically, and studied on entity matching applications in data updating and inconsistencies correction of multi-source data, presented reasonable and practicable methods based on entity matching for data updating and inconsistencies correction of multi-source data, the main research content and achievement include:(1) Summarized the present state and the trend of entity matching methods at home and abroad, pointed out the key problems that need to be further solved in the research on entity matching.(2) Introduced the terms, basic concepts and theories, they are closely related with this paper. Described the common flow of entity matching. Defined research category of entity matching at aspects of geometry type, data source, scale, tense, data coverage degree. Classified entity matching and analyzed the difficulties in entity matching. Researched on quality evaluation method of entity matching. These researches show that selection of similarity measure index is related to matching cases, in different conditions, different matching methods and strategies should be adopted, and it is not necessary to explore a uniform entity matching method to resolve all matching issues.(3) For the point entity matching: presented attribute similarity measure calculation methods for all kinds of fields. Conducted experiments on matching method based on distance mutually-nearest, obtained good effects. For the data which is strong coverage on the whole, while some local dense data is weak coverage, this paper introduced environment similarity measure of point entity and its calculation method for the first time, and proposed the matching method based on similarity of distance, attribute and environment, which significantly improved the quality of entity matching.(4) For the line entity matching: In the research on similarity measure, according to some good characteristics of overlap area of two entities’ buffers, proposed distance similarity measure based on buffer’s overlap area, comparing with previous distance calculation method, it has some advantages such as small computational amount, low computational complex and good effects. The shape similarity measure based on polyline azimuth code presented in this paper has invariant characteristic after the operation of translation, rotation, scaling, avoids subtle dithering of entity, and more intuitive, easy to calculate, effectively improves the ability of identifying identical entity. Offered the calculation method of topology similarity measure for line entity. For the first time, this paper presented environment similarity measure and its calculation method, effectively improved the ability of identifying identical entity. Presented an algorithm based on buffer division for searching candidate matching entities, this method effectively excludes some impossible matching target entities, improves the processing efficiency of entity matching. Proposed matching method based on combining multi similarity characteristics like length, distance, shape, topology, environment, attribute, for matching multi-scale line data, adopted matching method based on interrelated position under some restriction condition, they both adopted strategies of bidirectional matching and clustering-combination, effectively resolved the one-to-many and many-to-many matching issues. Methods introduced in this paper have simple calculation, higher processing efficiency and better matching quality than previous matching methods.(5) For the area entity matching: At the aspect of similarity measure, this paper provided calculation methods for similarity measure based on barycenter distance between two area entities and entity topology similarity measure, presented integration similarity measure index combining entity area and entity overlap area, and proposed area entity environment similarity measure, synthetically using these similarity measure indexes, enhanced the distinguish ability of similarity measure indexes. Presented an algorithm based on entity interior intersection relation for searching candidate matching entities, which is more quick and exact than other searching methods, improved the processing efficiency of entity matching. Proposed matching method based on combining multi similarity characteristics like barycenter distance, entity’s area, overlap area, effectively resolved the one-to-many and many-to-many matching issues by strategies of bidirectional matching and clustering-combination, the matching method has higher processing efficiency and better matching quality than previous matching methods.(6) Studied the application of entity matching in data updating, this thesis proposed the method of data updating based on entity matching, designed a workflow of entity matching, change detecting and updating disposal considering not losing original information. By using spatial entity searching method based on entity interior intersection relation, greatly improved efficiency of spatial analysis, and it benefits building entity map relation on the condition of lacking entity associated relation between two datasets. The geometry similarity calculational model based on weights resolved the complex matching problems, obtained good effect, it is suitable for vector database updating.(7) Researched on multi-source data inconsistency processing technology based on entity matching. Inconsistency correction of multi-source data is classified into two class according to entity matching type: inconsistency correction in the case of one-to-one matching and inconsistency correction in the case of non-one-to-one matching. Based on entity matching, studied inconsistencies correction method at aspects of geometry position, shape and attribute. For the inconsistency correction of identical points, gets the correct point by calculating average value or weighted average of identical points, for the correction of line entities, proposed the node projection average(or weighted average) algorithm based on node route length ratio, for the correction of area entities, for the area features without obvious turning points, put forward the method of nearest point average(or weighted average) with some constrained conditions for correction, for the area features with obvious turning points, firstly, finds the identical points and corrects them by their average(or weighted average), then uses the nearest point average(or weighted average) method to correct the other points. For the attribute inconsistency correction, adopts correction method based on attribute translation direction and translation operation in the case of non-many-to-many matching. Experiments shows, all the correction algorithms presented in this paper are feasible, and better than previous algorithms.
Keywords/Search Tags:Entity matching, Entity similarity measure, Matching strategy, Data updating, Data fusion, Inconsistency correction
PDF Full Text Request
Related items