Research On Key Technologies Of Entity Identification Of XML Data

Posted on:2012-01-26

Degree:Master

Type:Thesis

Country:China

Candidate:Z Han

Full Text:PDF

GTID:2218330362950414

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

As the rapid development of network technology, the amount of XML data is increasing in high speed, expecially in the fields of publishing network data, exchanging data among many organizations, and E-commerce. XML has been the standard of data representation, data storage and data exchanging. In the application of XML data identification and integration, the technology of entity identification of XML data is in great demand. At present, in the research of entity identification technology of XML data, the main methods are based on the distance measure and similarity functions, and researchers usually ignore the optimization of entity identification of XML. But in the real world, on the one hand, different sources always have different ways of data representation, and the XML data are usually dirty, the two similar xml data is not necessarily the same entity, the two xml data represented the same entity may be not similar. On the other hand, different sources contain many irrelevant entities, in the process of entity identifying, there is much useless cost ,and it has much optimization space.This paper proposes the method based on the semantics rules for entity identification of XML data, and the optimization solution based on the double-clusters. We first give the idea of"tree-similarity structure", it uses the tree node's semantics and XPath semantics, the structure is two comparabile nodes connected by comparation operators and XPath limitation. Second, infers some"identification-tree-similarity structure"to match the entities according to the tree-similarity structures and reasoning rules. Identification-tree-similarity structure can conform the matching quality when the sources are dirty. Third, we optimize the data source's scale, we first build index for each xml tree of two XML data sets, then according to the index similarity , put the similar tree into the same cluster, if the tree does exist in any clusters, we will not do any operations to it. At last, Experimental results show that the method in this paper outperforms existing algorithms in efficiency without accuracy loss, and the effect of optimizaton method is better.

Keywords/Search Tags:

XML, Entity Identification, Semantic, Optimization Algorithms, Clusters

PDF Full Text Request

Related items

1	Entity Navigation Methods In The Semantic Web
2	Research On Key Techniques Of Semantic-based Entity Search In Dataspaces
3	Construction Of Web Community Text Entity Relations Map Based On Semantic Elements
4	Research On Named Entity Recognition And Disambiguation Based On Network Semantic Resource
5	Research On Load Balance Algorithms Based On PC Clusters
6	Research Of Clustering Algorithms For Detecting Arbitrary Clusters Based On K Nearest Neighbors
7	Research On Vector Alignment Algorithms For Real Estate Data
8	Research On Entity Linking Algorithm By Combining The Attention Mechanism And Hidden Semantic Information
9	Research Of Collective Entity Linking Based On Joint Embedding Of Word And Entity
10	High-efficient Heuristic Algorithms On Prediction Ground State Structures Of Clusters