Font Size: a A A

Research On Scientific And Technological Data Reuse Based On Deep Learning

Posted on:2020-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:H W LvFull Text:PDF
GTID:2428330599958537Subject:Education Technology
Abstract/Summary:PDF Full Text Request
With the development of the computer field,the structure of scientific and technological data tends to be complex,and the rate of data duplication is increasing in the process of data integration.Data multiplexing technology is an effective means to solve the problems of saving storage space and improving data utilization in data management.In this thesis,a data entity association mapping multiplexing method based on similarity analysis of scientific and technological data is proposed,relying on the project of "Research and Development of Standardized Processing and Application System of Scientific and Technological Big Data in Hebei Province"(172110113D)and "Public Platform of Scientific and Technological Innovation Big Data in Hebei Province".Firstly,the data dimensions of different data sets in the scientific and technological data are analyzed,and the similarity dimension group pairs are established.On this basis,the similarity ratio of the entity data dimension values is further calculated,and then the association mapping relationship between the data entities is established.Finally,standardization and storage are carried out to achieve the purpose of reuse of scientific and technological data.The main research work of the thesis is as follows:(1)Optimization analysis of similar dimension group pairs of scientific and technological data based on Deep LearningA dimension weight quantification mechanism is established to analyze the data dimension weight and screened reusable dimension.Similarity matching analysis is carried out for different dimensions of scientific and technological data sets,and similarity dimension group pairs between different data sets are established;The Deep Learning algorithm is used to train and obtain the optimal similarity dimension group pairs.(2)Construction and standardization of the entity association mapping for scientific and technological DataAccording to the similarity dimension group pairs formed between data sets,the data entity association mapping analysis of different data sets is performed.Compared with the traditional data reuse object based on query intermediate results,this thesis starting from the association mapping established between the data entities,the similarity degree of the data entities dimension values of the similar dimensional group pairs is calculated,and the similarity threshold is set.According to the calculation results,it is judged whether the data entities are similar and reusable.Association mapping relationships are established between reusable data entities and standardized.(3)Columnar storage of scientific and technological data reuse resultsThe results of the scientific and technological data reuse analysis are stored by using columnar storage.Compared to traditional row storage,columnar storage is data storage in the form of key-value pairs,ie <key-value>.In the process of data multiplexing,the number of dimensions is often non-fixed constant.Traditional row storage is difficult to solve this problem,and the <key-value> structure of columnar storage exactly matches the form of "standard dimension-standard dimension value".At the same time,the columnar storage structure itself also has the characteristics of controlling redundancy,which can improve the multiplexing efficiency of data.
Keywords/Search Tags:Scientific and technological data, Data multiplexing technology, Similarity degree, Deep Learning, Data association
PDF Full Text Request
Related items