Font Size: a A A

Design And Implimention Of Data Mining And Migration System Based On Hadoop

Posted on:2014-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:M Y LvFull Text:PDF
GTID:2248330392461050Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Enterprise information systems usually contain multiple business system,and each business system contains its own set of business system, backupsystems and archiving system. The disadvantages of this system are complexmanagerment, easily waster of storage space and poor system scalability. Inconsideration of these shortcomings, this thesis designs and implements atiered storage system, using a large platform to manage the multiple businesssystems and making each business system backup and archiving system intoone. The tiered storage system provides a data mining and data migrationsolution based on hadoop framework. The detail contents are as following:(1) Research Hadoop key technologies, including MapReduce distributedarchitecture, HBase database, HDFS distributed file system.(2) Designed and implemented a tiered storage system based on Hadooparchitecture, and detailedly stated the system and data platform architecturedesign.(3) Designed and implemented the data mining module based onMapReduce. Applied the traditional relational database analysis method tothe HBase database, efficient classifying the data of HBase.(4) Designed and implemented a data migration module, making thestructured and unstructured data of online business platform migrate to thelarge data platform. Structure data migration process uses MapReduce datamigration, which use thesis designed IO scheduling algorithm, consideringthe resources use and avoiding assigning tasks to the nodes with heavy IOload. Designed and implemented the unstructured data migration tool, usingthe FTP to multiple concurrent migrate the business platform log file to theHDFS specified directory. (5) Complete test of functionality and performance of the system. The testresults show that all the functional modules of the system safisfy the designrequirement, and the IO scheduler has better performance than the defaultscheduler.The thesis design data mining and data migration system to meet thespecific needs. The system has better performance of concurrent datamigration and better data analysis about consumers and business.
Keywords/Search Tags:Hadoop, Big data platform, MapReduce, Data mining, Dataanalysis, Data migration
PDF Full Text Request
Related items