Research And Implementation Of Mass Data Archiving And Restoring System

Posted on:2014-11-07

Degree:Master

Type:Thesis

Country:China

Candidate:Y L Li

Full Text:PDF

GTID:2268330422452782

Subject:Measuring and Testing Technology and Instruments

Abstract/Summary:

PDF Full Text Request

Mass data exists in enterprise information platform, most of this data could become historicaldata which has a lower access frequency. It occupies large amount of system resources, and hasadverse impact on server performance and service quality. How to migrate the mass data fromenterprise information platform to low cost storage device and manage it effectively, is a problem inthe development of enterprise informationization we must face up to. Data archiving technology canresolve the problem effectively. But most of archiving systems are still narrowly targeted, anddesigned for special database platform and special data administrator at present. Enterprises need ageneral data archiving and restoring system with high performance which can be easy to operate andcan handle widespread data.The reason why the data archiving and restoring system can manage data effectively are dataclassification and index which can describe the characteristics of the data completely. According tothis principle, this paper studies and analyzes basic idea and mothod of text classification technologyfirstly, and then proposes improvements on the basis of traditional text categorization methods.Designs SVM classifier, trains the text classifier with corpus, builds text classification model. The textdata is divided into different categories in order to build classification index for non structural data.Second, applies full text retrieval to archiving system, studies full text retrieval engine-Lucene. Buildsdata index and retrieval model with Lucene, and personalized data retrieval schema according touser’s retrieval behavior. Lucene takes centralized management in index. It’s difficult to meet theapplication demand while handing index of mass data. So establishes index management server withSolr, deploys distributed retrieval system to promote the system’s mass data processing ability.Third,for the structural data which comes from relational databases, designs structural data archvingmodel based on XML after making a close study on difference of heterogeneous databaseenvironment. The model realizes archiving and restoring of structural data.In the last, the general mass data archiving system is built with the integration of modules above.Analyses the setting of classifier’s parameters and different feature selection methods’ influence to theresult of text classification by experiment. The structural data archiving and restoring schema’sefficiency is validated as well.

Keywords/Search Tags:

mass data, data archiving, text classification, full text retrieval, heterogeneous databases

PDF Full Text Request

Related items

1	Massive Data Storage And Full-text Search
2	The Research And Application Of Unstructured Data Processing Technology
3	Research And Implementation Of The Full-text Retrieval System In XML Databases
4	Acquisition, Storage And Retrieval Of E-commerce Mass Data
5	Information Retrieval Oriented Text Classification Technology Research
6	Research On Retrieval And Mining On Probabilistic Data And Hierarchical Text Classification
7	Design And Implementation Of Heterogeneous Document Library Full-text Retrieval System
8	Research And Implementation Of Distribute Massive Text Data Index And Retrieval System
9	The Research And Implementation Of Full-text Retrieval System Based On Lucene
10	The Design And Implementation Of The Heterogeneous Data Joint Retrieval System