Font Size: a A A

Research And Implementation Of Statistic Over Data Streams For Massive Database System

Posted on:2009-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:H WeiFull Text:PDF
GTID:2178360242498975Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of computer techniques, the application for very large data base has become more and more popular. The data stream technology has already been studied and many excellent algorithms and products have been brought forward, which make it a mature data base model. The data loading by data base, which has the characters of continuity, speedy, changing with time and so on, thus can be is treated as a data stream to deal with.Starting with the data processing before storage, through the research of data stream statistic, this paper puts forward an data stream statistic severing architecture, and realizes information statistic processing of the loading data stream. According to the background of a very large statistic application data base, this paper also realizes the processing of abnormal data in loading data stream. It not only does the statistics of the abnormal data, but also assure to renew the statistic results with abnormal data, making the following processing results consistent with the records in data base. Meanwhile, in order to satisfy the demand of loading service after adding statistic service, lightening the pressure for the following query, we also put forward an short-text based effective method to eliminating duplication, aiming at the repetitious data in stream. In the end of the paper, we test the statistic service and verified its correctness.In the paper , we mainly focus on using the data stream statistic results to maintain semantic cache as the specific application examples of data stream statistic service. The using of data stream statistic in semantic cache maintenance, can reduce the response time of aggregated query, transfers the processing pressure in query server into loading server, and then enhances the whole performance and stability of system.We have made several contributions in this paper.(l) We have brought forward a data stream statistic service architecture facing very large data base loaing, and the statistic service can effectively finish the statistic with little affects on loading process.(2)We have realized the statistical method for abnormal data stream. By adopting multiple data stream processing methods, we have maintained an abnormal data stream glide window beside the regular data stream glide window, and the dynamical allocation base window have accomplished abnormal data statistics, and renew the statistic base and query results with the statistical results that were sever hours later.(3)We researched the semantic caching maintenance, and through combining the statistical results and semantic caching, put forward a way to solve the semantic caching maintenance problem. By transferring the pressure of the query data base server to loading process, it enhances the whole systematic function and stability. (4)We study the data washing technology, aiming at the duplicated data in short text, carry out an effective eliminating duplication method to deal with mass short text data base, which reduce the data scale and then elevate the performance of data base continuous processing.According to the technology mentioned in paper, we have realized a data stream statistic service facing large quantity of data loading on large-scale affair transactions processing middleware StarTPMonitor. Combining the statistical summary information and semantic caching, the service improves the performance of semantic caching, and greatly enhances the capacity of system query ability.
Keywords/Search Tags:Data Stream, VLDB, statistic, semantic caching, maintenance, abnormal data, duplication eliminating
PDF Full Text Request
Related items