Research And Implementation Of The Muti-Task-Parallel Scheduling Loading Technology Of Massive Text Data

Posted on:2010-07-22

Degree:Master

Type:Thesis

Country:China

Candidate:G Q Chen

Full Text:PDF

GTID:2178360278956785

Subject:Computer Science and Technology

Abstract/Summary:

With the development of networks, to store and manage the massive text data is an urgent need to network information security management,the load and management technology of high-performance is become more and more important to it,so,to research high-performance technology of massive text data is of great theoretical significance and the value in application.The massive text data in the Network information security management has the following characteristics: high-speed data generated, the high density, large-scale and non-stop for 24 hours a day,the applications call for full-text search of high-performance. In view of the above characteristics of data and need of applications,this article studied in the following aspects of the loading techniques of the massive text data.First,research on muti-pipelined-parallel loading technology. First of all, we divide the load process into a number of independent data collection,and load the data colllections parallely.Then,for each data set,we tap the pipeline-parallel deeply,text of the massive data is divided into a number of independent data collection. Second, for each data set to tap the pipeline in parallel, so the whole loading process will be divided into several stage in order to achieve a high-performance multi-load parallel pipeline. In the internal pipeline, by using switching technology of ORACLE10G,we tap the parallel characteristics of the each data partition further,and divide the loading process into some sub-tsk which can be scheduled parallely.To aimed at the binding relationship between the sub-taks,we studied a algorithm of multi-task scheduling with binding relationship.The resources are always distribute unevenly,for example, computing resources, I / O resources as well,and the server nodes are heterogeneous.For these natures, a virtual pool of resources technology is studied by us,we calculate the number of resources of the nodes in accordance with its ability,and put the calculated resources into virtul resourses pool,then,the resourses in the virtul resourses pool can be scheduled unified, so as to achieve the mixed load balance in the heterogeneous environment,and the maximum utilization.Based on the above technology, we developed the system of Muti-Task-Parallel scheduling Loading Technology of Massive Text Data.The system was tested by the third-party,and the test result shows that the system reached a very high load performance (up to the peak load record of 20,000,000,000 / 24 hours a day, the size of each record is 0.5KB ). Now,the system is runed online stable for more than 3 months.

Keywords/Search Tags:

massive text data, loading technology, Multi-Dimensional data, virtul resourse

Related items

1	Design And Implementation Of The Loading Technology Of Massive Text Data
2	Design And Implementation Of Parallel Loading Technology For Massive Financial Data
3	Research And Implementation Of Distribute Massive Text Data Index And Retrieval System
4	The Research And Implementation Of Massive Short Message Mining Technology
5	Efficient Algorithms For Processing Data Streams And Massive Text
6	The Research And Application Of Unstructured Data Processing Technology
7	Target Multi-dimensional Association Mining And Deep Learning Method Based On Massive Infrared Video
8	Massive Data Aggregation And Parallel Implementation With Complex Constraints
9	Design And Application Of Bank Decisoin Support System
10	Research On Visible Multi-dimensional Data Modeling Technology Based On CWM