Font Size: a A A

A Two-dimensional Index Structure Based P2P Query Of Multi-dimensional Data

Posted on:2010-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:H B LuFull Text:PDF
GTID:2178360302960682Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the network technology in recent years, many P2P systems have emerged and the P2P technologies get more and more attention. P2P technologies are mainly used in information retrieval, file sharing systems, distributed computing and electronic commerce and so on. The information retrieval as the primary means of searching information in the web is currently the most common applications in P2P technologies.High-dimensional data has always been a hot research in the database fields. There are many applications in practice, such as data mining, multimedia information retrieval and so forth. Similarity retrieval is a very critical issue, which is to find a more similar data with the given object in the large data set. In high-dimensional data retrieval process, the distance calculation is an important factor that affects retrieval efficiency. In order to reduce the distance calculation, some solutions have been proposed in recent years, which mainly based on approximate vector representation or create a one-dimensional index for data. The former one is usually to find an approximate vector representation for the high-dimensional data in order to simplify the search space, such as the VA-file. To establish a one-dimensional index for data means to transform the high-dimensional data into one-dimensional data in some way so as to reduce the effects of dimensionality. A typical representative of this approach is idistance.Different from the low-dimensional space what we are familiar with, the high-dimensional space has its own unique characteristics with the data distribution, that is the high-dimensional data space is virtually hollow, which makes the majority of multi-density estimation methods can not reach accurate result. That is because the region with low density accounted for a significant portion of the distribution volume, and high density regions are lack of sufficient observations. Based on the analysis of these distribution characters of the high-dimensional space, this paper split it into several sub-spaces according to the amount of data, so that these data in the sub-spaces can be distributed evenly. Division of the sub-space is a vertical division of data space. To create district partitions for further division on the basis of the sub-space, this is a horizontal division of the data space. After the space is divided, create the two-dimensional index value for the data set based on the approximate vector representation and the creation of one-dimensional data index, making the mapping between data indexing and the identifier of the peers of structured P2P network Chord. To implement a two-layer filter with the query during retrieval, this has reduced the distance calculation and gained a high performance of query. The experimental results show that the two-dimensional indexing structure has a good performance in precision rate and efficiency of search.
Keywords/Search Tags:Range query, Chord, Sub-space, Zone bit code
PDF Full Text Request
Related items