Font Size: a A A

Research And Implementation Of Mass Data Archiving And Restoring System

Posted on:2014-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y L LiFull Text:PDF
GTID:2268330422452782Subject:Measuring and Testing Technology and Instruments
Abstract/Summary:PDF Full Text Request
Mass data exists in enterprise information platform, most of this data could become historicaldata which has a lower access frequency. It occupies large amount of system resources, and hasadverse impact on server performance and service quality. How to migrate the mass data fromenterprise information platform to low cost storage device and manage it effectively, is a problem inthe development of enterprise informationization we must face up to. Data archiving technology canresolve the problem effectively. But most of archiving systems are still narrowly targeted, anddesigned for special database platform and special data administrator at present. Enterprises need ageneral data archiving and restoring system with high performance which can be easy to operate andcan handle widespread data.The reason why the data archiving and restoring system can manage data effectively are dataclassification and index which can describe the characteristics of the data completely. According tothis principle, this paper studies and analyzes basic idea and mothod of text classification technologyfirstly, and then proposes improvements on the basis of traditional text categorization methods.Designs SVM classifier, trains the text classifier with corpus, builds text classification model. The textdata is divided into different categories in order to build classification index for non structural data.Second, applies full text retrieval to archiving system, studies full text retrieval engine-Lucene. Buildsdata index and retrieval model with Lucene, and personalized data retrieval schema according touser’s retrieval behavior. Lucene takes centralized management in index. It’s difficult to meet theapplication demand while handing index of mass data. So establishes index management server withSolr, deploys distributed retrieval system to promote the system’s mass data processing ability.Third,for the structural data which comes from relational databases, designs structural data archvingmodel based on XML after making a close study on difference of heterogeneous databaseenvironment. The model realizes archiving and restoring of structural data.In the last, the general mass data archiving system is built with the integration of modules above.Analyses the setting of classifier’s parameters and different feature selection methods’ influence to theresult of text classification by experiment. The structural data archiving and restoring schema’sefficiency is validated as well.
Keywords/Search Tags:mass data, data archiving, text classification, full text retrieval, heterogeneous databases
PDF Full Text Request
Related items