Font Size: a A A

Research On Strategies Of Indexing In Dataspace

Posted on:2013-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:S ChenFull Text:PDF
GTID:2218330371957607Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, with the explosive growth of web data, hundreds of terabytes of data is distributed in the world on numerous servers. The existing traditional database management system for personal data appears helpless in the management of heterogeneous data. The dataspace, a new data management technology, arises at the historic moment. Dataspace is facing the heterogeneous data, how to efficiently organize and manage this information, make the enterprise or individual can conveniently share the data, and comprehensive search the information need by users fast and accurately is a major challenge the field of Information Science and Technology facing.The thesis conducts research of index strategy in the dataspace; reviews the research status of dataspace and its application; introduces the basic concepts and features of the dataspace; analyzes the three levels of heterogeneity in the dataspace; summarizes the search and index requirements of dataspace; analyzes the index and search mechanisms of the full-text search engine tool kit Lucene. We design a new heterogeneous tolerance index strategy for dataspace by improving the existing research thought. The process of constructing the index based on the strategy and the solution to the problem of five forms of heterogeneous (attribute name and association name are of the same name, distinguish between associated instance and relevant instance, keywords are similar in a certain probability, attributes or associations are similar in a certain probability, and instances are similar in a certain probability) are given, and the method of sorting search results by changing index occurrence count is also described. Finally, we establish the index of text file and heterogeneous data based on Lucene, realize the keyword query and the structured query, get the index occurrence count and solve the problem of similar heterogeneous, and carry out the optimization and updating of the index. Then we assess the performance of the new index and verify the effectiveness of it.The thesis has done beneficial research work on the index technology of the dataspace.
Keywords/Search Tags:Dataspace, Data Management, Indexing, Heterogeneous Data
PDF Full Text Request
Related items