Data Mining Platform Research For National Crop Germplasm Resources Database

Posted on:2016-06-24Degree:MasterType:Thesis
Country:ChinaCandidate:K PanFull Text:PDF
GTID:2308330461989611Subject:Management Science and Engineering
The National Crop Germplasm Resources Database with 200 kinds of crops, 410,000 pieces of germplasm information and 24 million pieces of data item value is one of the largest databases of plant germplasm in the world. The data capacity of it is 230 GB. With the development of the agricultural research, it has gradually been important content in the research of crop germplasm resources informationthat digging the information contained in these vast amounts of data by using the principles, methods and techniques of data mining.The application of relevant data mining techniques is of great importance to give full play to National Crop Germplasm Resources Database’s function and protect and use our national rich crop germplasm resources better.This paper introduced cloud computing and related technology to data digging of crop germplasm resources in order to meet the growing computing power needs of the crop germplasm data, combining with the new development direction according to basic conditions and developmental needs of national crop germplasm data.This paper researched the basic theory, general process and common methods of data mining which was regarded as theory and technicalfoundation to build crop germplasm resource data mining platform, analyzed main cloud computing platformat home and abroad and studied architecture of Hadoop, an open-source platform. It also accomplished main design and prototype development of crop germplasm resources data mining platform based on cloud computing, and described architecture, work flow and related function of designed data mining platform in detail.This paper studied classic Apriori algorithm based on parallelization strategy of MapReduce framework and realized parallelApriori algorithm by using JAVA. And then deployed it on mining platform. Based on the mining platform, the paper tried to dig data to national crop germplasm resources database by using parallel Apriori algorithm and obtained related knowledge about characteristics of rice germplasm preliminarily. Next, the paper compared the computation efficiency of parallel Apriori algorithm with the computation efficiency of classic Apriori algorithm. In addition, it tested speedup performance of the platform. Finally, it verified scientific, validity and feasibility of design of crop germplasm resources data mining platform.
Keywords/Search Tags:crop germplasm resources, data mining, hadoop cloud platform, apriori algorithm
