Font Size: a A A

Research On Distributed Parallel Skyline Query Processing Technology Over Uncertain Data Streams

Posted on:2014-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhaoFull Text:PDF
GTID:2308330479479208Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, with the development of information technology, uncertain data streams are widely appeared in many application scenarios such as sensor networks, LBS, RFID networks. Efficient query over uncertain data streams has become an important aspect of big data processing at present. Skyline query over uncertain data streams plays an important role in many applications such as data mining, decision support and monitoring environment, and has become a research hotspot in the field of database. However, most of Existing studies are aimed at skyline query processing in the single-machine environment. When the demand of query response speed is higher or the size of the sliding window is larger, the single-machine environment can’t meet the requirements of real-time queries because of the limitation of computing resources and so on. The distributed computing environment such as data center is widely used, which provides favorable conditions to achieve the distributed parallel skyline query processing over uncertain data streams. For skyline query over uncertain data streams with high speed, the challenge of the study at present is how to make full use of the distributed computing environment to realize the parallel query processing and improve the efficiency of query processing. Around the research challenges, the research on distributed parallel skyline query processing over uncertain data streams is well studied in this paper.First of all, for the problem that the existing centralized query processing method based on the single-machine environment cannot meet the requirements of users’ query due to insufficient computing capacity currently, a new parallel processing model TPM is proposed. Compared with a parallel processing model named CPM, in TPM, parallel nodes in the same level only need to maintain local sliding window, but need not to synchronize intermediate results through communications. Experimental results show that the processing method based on TPM is better than the method based on CPM in query response speed in different sliding window size, data dimension and the number of parallel nodes, which could meet the requirements of parallel skyline queries over uncertain data streams.Then, for the problem of existing algorithms that cannot meet demands of some high speed applications such as military operations and natural disasters monitoring, this paper proposes a grid-based-probability-record optimization technology, which reduce the times of dominance test in parallel nodes of dominance test modules and the repeated calculation of local skyline probability. Experimental results show that the algorithms of the grid-based-probability-record optimization technology can reduce computation overhead in parallel nodes effectively and meet the demand of high speed query processing.Finally, for the problem that the result of Skyline query processing cannot fit users’ demands completely in real applications, this paper studies the extended Skyline queries over uncertain data streams, designs the enumerating skyline query over uncertain data streams, and proposes an approach of the enumerating skyline query processing based on dominance graph. The dominance graph can record the whole dominance relationship between any two tuples in the local sliding window of parallel nodes to locate the tested tuple rapidly. Experimental results show that the approach based on dominance graph has good performance under the condition of high dimensional data.
Keywords/Search Tags:Uncertain Data Streams, Skyline Queries, Parallel Processing, Grid-Based-Probability-Recorded, Dominance Graph
PDF Full Text Request
Related items