Font Size: a A A

Research And Implementation Of Query And Indexing Mechanism In Personal Dataspace Management System

Posted on:2012-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:H F DuFull Text:PDF
GTID:2178330335951469Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of internet technology, personal data presents new characteristics, which are massive scale, heterogeneous data forms, complicated and changeable relationship. Meanwhile, the requirements for data management become higher and higher by users and traditional solutions can no longer meet the newly emerging demand completely. Naturally, how to effectively manage vast amounts of heterogeneous personal data is becoming extremely important.As a new data management technology, dataspace can meet the increasing complex demands of users because of its loose data model and pay-as-you-go integration method. This paper studies the query and indexing mechanism in personal dataspace management system, our main contributions can be summarized as follows:1. A more powerful query language named E-iQL(Enhanced iMeMex Query Language) is proposed, which can support a query mode of combining path query expression and association query expression compared with iQL. Based on the generalized resource view, the logical algebra of E-iQL is given, which is the basis of query optimization.2. A partial index mechanism based on the extraction of core sentences is presented. In order to reduce the high cost of index maintainance and improve the efficiency of keyword query, we introduce word segmentation technology, sentence similarity computation and core sentence extraction technology of Natural Language Processing, and index the core sentences instead of the full text. During the realization process, the typical algorithm of sentence similarity computation based on the semantic dictionary is simplified, which reduce the computational complexity greatly. The K-Medoids clustering algorithm is modified from determination of cluster number and initial cluster centers, so more accurate results of core sentence extraction are obtained.The experimental results show that E-iQL can express the user's requirements for query easily and efficiently, and the partial index mechanism is better than full-text index in recall ratio and precision ratio.
Keywords/Search Tags:Personal data management, dataspace, query language, partial index
PDF Full Text Request
Related items