Research And Implementation Of Statistic Over Data Streams For Massive Database System

Posted on:2009-05-18

Degree:Master

Type:Thesis

Country:China

Candidate:H Wei

Full Text:PDF

GTID:2178360242498975

Subject:Computer Science and Technology

Abstract/Summary:

With the development of computer techniques, the application for very large data base has become more and more popular. The data stream technology has already been studied and many excellent algorithms and products have been brought forward, which make it a mature data base model. The data loading by data base, which has the characters of continuity, speedy, changing with time and so on, thus can be is treated as a data stream to deal with.Starting with the data processing before storage, through the research of data stream statistic, this paper puts forward an data stream statistic severing architecture, and realizes information statistic processing of the loading data stream. According to the background of a very large statistic application data base, this paper also realizes the processing of abnormal data in loading data stream. It not only does the statistics of the abnormal data, but also assure to renew the statistic results with abnormal data, making the following processing results consistent with the records in data base. Meanwhile, in order to satisfy the demand of loading service after adding statistic service, lightening the pressure for the following query, we also put forward an short-text based effective method to eliminating duplication, aiming at the repetitious data in stream. In the end of the paper, we test the statistic service and verified its correctness.In the paper , we mainly focus on using the data stream statistic results to maintain semantic cache as the specific application examples of data stream statistic service. The using of data stream statistic in semantic cache maintenance, can reduce the response time of aggregated query, transfers the processing pressure in query server into loading server, and then enhances the whole performance and stability of system.We have made several contributions in this paper.(l) We have brought forward a data stream statistic service architecture facing very large data base loaing, and the statistic service can effectively finish the statistic with little affects on loading process.(2)We have realized the statistical method for abnormal data stream. By adopting multiple data stream processing methods, we have maintained an abnormal data stream glide window beside the regular data stream glide window, and the dynamical allocation base window have accomplished abnormal data statistics, and renew the statistic base and query results with the statistical results that were sever hours later.(3)We researched the semantic caching maintenance, and through combining the statistical results and semantic caching, put forward a way to solve the semantic caching maintenance problem. By transferring the pressure of the query data base server to loading process, it enhances the whole systematic function and stability. (4)We study the data washing technology, aiming at the duplicated data in short text, carry out an effective eliminating duplication method to deal with mass short text data base, which reduce the data scale and then elevate the performance of data base continuous processing.According to the technology mentioned in paper, we have realized a data stream statistic service facing large quantity of data loading on large-scale affair transactions processing middleware StarTPMonitor. Combining the statistical summary information and semantic caching, the service improves the performance of semantic caching, and greatly enhances the capacity of system query ability.

Keywords/Search Tags:

Data Stream, VLDB, statistic, semantic caching, maintenance, abnormal data, duplication eliminating

Related items

1	Efficient Real-time Semantic Data Stream Processing Based On Forward And Backward Chain Reasoning
2	Research And Implementation Of Parallel Query Technology Based On Semantic Caching
3	Study On Data Stream Techniques And Its Application In Electric Power Information Processing
4	Domain-independent de-duplication in data warehouse cleaning
5	An Approach To Detecting Abnormal Sequence For Large-scale Sensor Data Stream
6	Research And Realization To The Semantic Caching Management Of Large Transaction Processing
7	Research On Data Organization For Data De-duplication System
8	Research On The Physical Monitoring System Of Offshore Workers Based On Beidou Satellite Data Transmission
9	Detection And Correction Method For Abnormal Data Over Data Streams Of Sensor Networks
10	The Research And Implementation Of Key Technologies In Data Stream Cube For Discovering Abnormal Operation On Virtual Property