Font Size: a A A

Research On Key Techniques Of Semantic-based Entity Search In Dataspaces

Posted on:2013-01-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:D YangFull Text:PDF
GTID:1228330467479814Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development and new features of data management technology, a new abstract method of information management-dataspace was proposed by database researchers, to satisfy on-going data management application requirements and to solve shortcomings of existing traditional database technologies and data integration systems. A dataspace system realizes data integration based on the pay-as-you-go strategy and has the features of heterogonous, evolution and no unique schema. Currently, the research on dataspaces related technology is in growing fast and attracting extensive attentions in academia as well as industry. However, on the aspect of semantic-based entity search, there are still many problems need to be researched and resolved. E.g., there is lack of open data model to describe heterogeneous entities and their rich associations in dataspaces. The ability to support some mining tasks and applications based on associations is relatively weak due to the lack of effective entity association mining approach. Because of lack of effective entity resolution (ER) technique aiming at evolution feature in dataspaces, we can not do effective data quality management in dataspaces. And the lack of understanding of user’s query intent hinders the realization of semantic-based entity search in dataspaces. In order to break the semantic restrictions of resources, maximize the use of various data resources, the research on key techniques of semantic-based entity search in dataspaces not only has important theoretical significance but also has high practical value.In order to better support and provide service of semantic-based entity search in dataspaces, aiming at the above problems, this dissertation researches on the key techniques of semantic-based entity search in dataspaces which includes entity-centric data model, clustering-based entity association mining algorithm, time-based collective entity resolution algorithm and association-based query intent disambiguation algorithm in dataspaces. Specifically, the major works are listed in the following:(1) Entity-centric data model in dataspaces. Aiming at heterogeneous of dataspace, using entity as data unit, a layered graph data model called IgDM is proposed which is composed of entity data graph Go and entity schema graph Gs.IgDM can describe heterogeneous entity classes, entities and their attribute values, and capture rich and complex associations among entity classes and among entities in dataspaces. Then we study the weight assignment scheme, indexing method, and query capabilities of igDM. Experimental results show the effectiveness of IgDM in aspect of describing rich semantic associations.(2) Clustering-based entity association mining algorithm in dataspaces. A four-phase entity association building model is proposed and association constraints verification is introduced in the whole life cycle of building process to verify the correctness of entity associations. And a clustering-based four-step entity association mining algorithm CFRQ4A composed of entities clustering, candidate associated entity pair filtering, induction and reasoning, and association strength quantifying is proposed to try to incrementally find entity associations with less manual effort. Experimental results show the accuracy and effectiveness of the proposed entity association mining algorithm.(3) Time-based collective ER algorithm in dataspaces. Aiming at the feature of entity evolution over time in dataspaces, a four-step time-based collective ER algorithm T-CER is proposed which includes preprocess, blocking, representations clustering and time constraints checking steps. And at representations clustering step a time evolution-based clustering algorithm TE-Clustering is proposed. Attribute evolution coefficient (aec) and relational evolution coefficient(red) are introduced in similarity measure to capture the time evolution effect on similarity. Besides we solve the resolution sequence problem in collective ER based on resolution sequence depend graph Gdepend-Extensive experimental results show the accuracy and effectiveness of the proposed T-CER and TE-Clustering algorithms.(4) Association-based query intent disambiguation in dataspaces. Aiming at inherent ambiguity of keyword query, leveraging associations among entity classes and entities a three-step query intent disambiguation algorithm composed of keyword semantic item mapping, goal entity class recognizing and candidate query set generating is proposed. Experimental results show the accuracy and effectiveness of proposed keyword query intent disambiguation algorithm.(5) We design and implement a semantic-based entity search prototype system called KeymanticES (keyword-based semantic-based entity search) based on the key techniques of semantic-based entity search proposed in this dissertation. Experimental results on real data sets from academic domain show the effectiveness of KeymanticES.In conclusion, aiming at heterogeneous, evolution and rich associations among entities features of dataspaces, this dissertation studies the key techniques of semantic-based entity search in dataspaces, and proposes several novel and effective solutions for research issues. We hope that these approaches and techniques could make some contributions to developing semantic-based entity search systems in dataspaces.
Keywords/Search Tags:dataspaces, data model, entity association, entity resolution, query intentdisambiguation, semantic-based entity search
PDF Full Text Request
Related items