Font Size: a A A

Design And Implementation Of DOA Unstructured Data Sharing Platform

Posted on:2020-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:K C LiFull Text:PDF
GTID:2428330578964973Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,the importance of data for national stability and social development is increasingly apparent.Through big data,you can intuitively understand the current status and the development of the social economy,make decision support for the government and enterprises,and analyze and predict future development trends.The importance of data is self-evident.In order to make data play more value and reduce the difficulty for people to obtain,various data sharing platforms have emerged at the right moment.However,the traditional data sharing platform adopts a centralized design idea to aggregate all the data into the database,which is mainly aimed at structured data.And unstructured data increase rapidly at present,commercial value is huge,in view of the unstructured data without a good sharing scheme.At the same time,unstructured data has the characteristics of heterogeneous,multi-source,massive,dynamic and real-time changes,the characteristics of traditional use for the design of the centralized data sharing scheme in the face of a heterogeneous,multiple source,massive,dynamic and real-time change of unstructured data seem to be very overwhelmed,data also has the ownership problems,The state currently pays great attention to the protection of intellectual property rights and has introduced relevant laws.However,the data ownership problem is not enough,and the legal level is also lacking,How to connect people with data and the ownership of data are also an urgent problem to deal with,but traditional data sharing solutions are not considered.At present,the cutting-edge blockchain technology is a decentralized design idea,but its application field and direction are limited,and it is not suitable for all data.Handle technology also provides a unified management and sharing scheme of unstructured data,but it also has limitations and shortcomings of its application.DOA provides a good solution to the problem of unstructured data sharing in the era of big data.DOA is a data-oriented architecture,which adopts the idea of "data-oriented and data-centered".It is mainly divided into three parts: data registry(DRC),data authority center(DAC)and data exception center(DEC).The data registry is a core module that uses a unified data registration standard to build a metadata metadata registry for data,and achieves unified management of data and external data services through a metadata information registry.The data authority center is the key module.Through the mechanism of "natural encryption and authorized use" of data,the ownership and security of data are guaranteed,and the ownership relationship between people and data is established.The data exception center is an important module,which combines with the data authority center to ensure the security of data and realize the tracking and tracing of data.DRC is the foundation and key for DOA to realize unified management of massive,multi-source,heterogeneous,dynamic and real-time unstructured data,realize data sharing,and break the problems of "data island" and "data chimney".The research content of this paper is as follows:(1)Starting from the DOA data-oriented structural system,combined with the research content of this paper,based on the existing research,the metadata information registration specification of the unstructured data of the DRC data registry is studied.(2)Research on the automatic extraction method of relevant metadata information,combined with the metadata information manually entered,study the single metadata registration method and automatic real-time registration method,and realize the visualization.(3)Based on the native SimHash algorithm and TF-IDF feature extraction algorithm,combined with IK Chinese word segmentation and SHA-256 hash algorithm,the similarity judgment of text type data of unstructured data is studied.(4)Analyze the advantages and disadvantages of the existing popular file storage systems of FastDFS and HDFS,and study an optimized data backup system combining the advantages of both.(5)Based on Mybatis and Spring framework,research on the implementation of DOA unstructured data sharing platform,compare the traditional "centralized" solution,and increase the "no centralization" solution.The main research results and innovations of this thesis are as follows:(1)With reference to Dublin metadata,the metadata registration specification for DRC unstructured data is proposed.Two data registration solutions are proposed for DOA data registration,and a data registration visualization tool that does not depend on the specific software environment is implemented.(2)An improved SimHash algorithm is proposed.Based on the original algorithm,the IK Chinese word segmentation device is added to make the SimHash algorithm support Chinese documents.At the same time,the feature extraction method is improved,and the TF-IDF algorithm is used instead of the original word frequency statistics to make the feature extraction more accurate.The SHA-256 hash algorithm is used as the internal hash function of SimHash,which ultimately improves the accuracy of the algorithm.(3)Based on the DOA data-oriented thinking,an unstructured data sharing solution is proposed on the DRC data registry and data backup center.On the basis of the traditional "centralization" thinking,through the implementation of the DRC data transmission applet,the "no centralization" design concept is realized.
Keywords/Search Tags:DRC, Data registration, SimHash, Data sharing
PDF Full Text Request
Related items