Font Size: a A A

Research On Inverted List Parallel Query Method Based On Dataspaces

Posted on:2016-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:D Y WangFull Text:PDF
GTID:2348330542475446Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays,there is more and more data generated in production and life.Previously,the structured data was important,with the development of the Internet,unstructured data becomes mainstream.How to retrieve useful information rapidly in heterogeneous data,the quality of index directly affects the result of large-scale heterogeneous data retrieval.How to frame efficient index becomes the key to solve problem of the large-scale heterogeneous data query.The main research is heterogeneous data.In order to adapt to the characteristics of heterogeneous data,the concept of dataspace is put forward to solve the current difficulties in the field of data management.Dataspace supports multiple sources query.For example,the desktop search system exists.Architecture of inverted list adapts to retrieve data in dataspace.How to take advantage of query log and the characteristics of the inverted list itself,how to improve the index architecture better becomes the focus of index of heterogeneous data.In this paper,all kinds of inverted index architecture are analyzed,the advantages and disadvantages of various index architecture and the requirements of system load-balancing are summarized.The paper makes use of keywords frequency and the Zipf's law distribution function to divide vertical partitioning index.In contrast of other method of inverted lists,vertical partitioning index can make the system to achieve the effect of load-balance.The categories or partitions of vertical partitioning index expand to two-dimensional index structure,reducing matching and consumption of the irrelevant tuples,improving the performance of query processing.Increasing two-dimensional index number of copies can constitute three-dimensional index architecture,in order to enhance the parallel processing ability to query system.The experimental results show that making use of the Zipf's law distribution function can divide vertical partitioning index and the categories or partitions of vertical partitioning index expands two-dimensional index,2D indexing constitutes a socalled 3D indexing,which makes the system achieve load-balancing and increase system parallel processing ability.
Keywords/Search Tags:Heterogeneous Data, Inverted List, Multidimensional Index, Load Balancing
PDF Full Text Request
Related items