Font Size: a A A

Research And Implementation Of The Muti-Task-Parallel Scheduling Loading Technology Of Massive Text Data

Posted on:2010-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:G Q ChenFull Text:PDF
GTID:2178360278956785Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of networks, to store and manage the massive text data is an urgent need to network information security management,the load and management technology of high-performance is become more and more important to it,so,to research high-performance technology of massive text data is of great theoretical significance and the value in application.The massive text data in the Network information security management has the following characteristics: high-speed data generated, the high density, large-scale and non-stop for 24 hours a day,the applications call for full-text search of high-performance. In view of the above characteristics of data and need of applications,this article studied in the following aspects of the loading techniques of the massive text data.First,research on muti-pipelined-parallel loading technology. First of all, we divide the load process into a number of independent data collection,and load the data colllections parallely.Then,for each data set,we tap the pipeline-parallel deeply,text of the massive data is divided into a number of independent data collection. Second, for each data set to tap the pipeline in parallel, so the whole loading process will be divided into several stage in order to achieve a high-performance multi-load parallel pipeline. In the internal pipeline, by using switching technology of ORACLE10G,we tap the parallel characteristics of the each data partition further,and divide the loading process into some sub-tsk which can be scheduled parallely.To aimed at the binding relationship between the sub-taks,we studied a algorithm of multi-task scheduling with binding relationship.The resources are always distribute unevenly,for example, computing resources, I / O resources as well,and the server nodes are heterogeneous.For these natures, a virtual pool of resources technology is studied by us,we calculate the number of resources of the nodes in accordance with its ability,and put the calculated resources into virtul resourses pool,then,the resourses in the virtul resourses pool can be scheduled unified, so as to achieve the mixed load balance in the heterogeneous environment,and the maximum utilization.Based on the above technology, we developed the system of Muti-Task-Parallel scheduling Loading Technology of Massive Text Data.The system was tested by the third-party,and the test result shows that the system reached a very high load performance (up to the peak load record of 20,000,000,000 / 24 hours a day, the size of each record is 0.5KB ). Now,the system is runed online stable for more than 3 months.
Keywords/Search Tags:massive text data, loading technology, Multi-Dimensional data, virtul resourse
PDF Full Text Request
Related items