Design And Implementation Of A New Germplasm Resources Data Warehouse System

Posted on:2019-05-19

Degree:Master

Type:Thesis

Country:China

Candidate:J N Jiang

Full Text:PDF

GTID:2428330542999226

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

As the ingredients of biological resources and biodiversity,plant germplasm resources play an import role in the society.Not only do the plant germplasm resources contribute to the safety of food as well as ecology,they are also essential to the sustainable development of agriculture.As one of the biggest countries in biodiversity,China owns abundant plant germplasm resources,both in variety and scale.Owing to the support of the government and the hard work paid by agricultural researchers,the information work of germplasm resources has been conducted.Consequently,the database,which holds germplasm data,has been built and is now in service for the public.However,with the deepening of the information work,germplasm data keep growing,while the hidden value within these data could hardly be mined.In the era of big data,it is essential that we incorporate big data technology into agriculture,thus the storage and sharing of germplasm data could be guaranteed and the value of these data could be revealed.Based on Hadoop technology,especially Apache Spark and Hive,a new data warehouse system is built for the mining of germplasm data.The key researching points of the thesis are seen as follows:Firstly,in the construction of the data warehouse system,many germplasm materials require classification according to their quality.An improved K-means algorithm,which is based on stacked sparse auto encoding neural networks and quotient space theory,is proposed,to help the clustering of germplasm materials.The data are then labelled,thus newly added materials could be automatically classified.Due to the high dimension of germplasm data,it is essential to introduce feature reduction for data processing.With the extracted features,data clustering could be more accurate and less time-consuming.Take the mixed feature data from stacked sparse auto encoding neural networks as original clustering center,the algorithm manages to overcome the sensitivity of selecting original starting points in K-means.Compared with traditional ways of utilizing PCA for dimension reduction,the algorithm turns out better in handling high dimension data for data clustering.Secondly,with the deepening of informationization in germplasm resources,germplasm data keep growing in size and variety,while the utilization radio of the data is low.A data warehouse system is built,based on Hadoop technology,especially Apache Spark as well as Hive.Detailed description and realization of the system is described in the thesis.Compared with traditional systems which are based on relational databases,the data warehouse system is much stronger in handling big data and easier in expansion.The data mining function of the data warehouse could assist plant breeding works in their work,providing scientific help while improving their efficiency.

Keywords/Search Tags:

Germplasm resources, Stacked sparse auto encoder, Data clustering, Data warehouse, Spark, Hive

PDF Full Text Request

Related items

1	Design And Implementation Of Agricultural Product E-commerce Data Warehouse Analysis And Evaluation System Based On Hive On Spark
2	Improvement Of Stacked Auto-Encoders And Its Engineering Application
3	Design And Implementation Of Investors’ Trading Behavior Management System In Precious Metal Market Based On Hive Data Warehouse
4	High-dimensional And Sparse Data Classification Based On Deep Learning
5	Design And Implementation Of Insurance Data Warehouse System Based On Hive
6	Design And Implementation Of Hive-based Purchase And Sale Data Warehouse System
7	Design And Implementation Of NetEase Mobile Big Data Support Platform Based On Spark And Hive
8	Deep Auto-encoder Framework For SAR Images Change Detection
9	The Research And Implementation Of Data Warehouse For Logistics Based On Hive
10	Research And Application Of Incremental Clustering Algorithm Based On Auto-Encoder