Font Size: a A A

Research On High-end Manufacturing Unstructured Data Management

Posted on:2019-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y J LiFull Text:PDF
GTID:2428330545986957Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
German industrial 4.0 has launched a new round of industrial revolution,pushing the manufacturing industry into the era of big data whose characteristic is intelligence.Unstructured data,as one of the important components of manufacturing big data,contains rich application value and has an important support for intelligent manufacturing.Therefore,how to manage manufacturing unstructured data efficiently has become a research hotspot.Manufacturing unstructured data have characteristics like various formats,complex relationship and the number of small-files is high and others.Because unstructured data can't be analyzed directly,existing unstructured data management system UDMS extracts the semantic features from unstructured data as metadata,unstructured data are managed by HDFS and metadata are managed by object deputy database TOTEM.However,this system still has the following problems in managing manufacturing unstructured data:1)Lack of a data model that can reflect the characteristics of manufacturing,it is difficult to organize and manage metadata effectively.2)The storage of a large number of small-files put pressure on HDFS.3)The efficiency of query in TOTEM is low.In order to better manage the manufacturing unstructured data,this paper studies the following aspects:First,based on the characteristics of manufacturing industry,we propose metadata modeling method which includes feature modeling and relationship modeling.Our method describes the features of data and the complex relationship between them by object deputy model.The features are divided into basic features,physical features and semantic features.The relationships include the inclusion relationship between documents and entities,the composition relationships and constraints among entities,and the sequence relationships among documents.Then,we propose small-files merging method based on relevance to solve the problem of small-files storage based on HDFS.The core is to cluster small-files through comprehensive consideration of file relevance and storage space utilization,a cluster of small-files are merged into a large file and the large file is stored in HDFS.In addition,the operations of small-files such as addition,deletion,modification and query to support small-flies merging are designed.Next,because the efficiency of cross-class query in TOTEM is low,we propose associative query index method to increase the efficiency of path expression computation which is the core of cross-class query.This method uses inverted index to store the deputy relationship between files,and can support the computation of path expressions with predicates.In addition,a batch maintenance method is proposed to maintain the index.Finally,the above modeling and optimization methods all are implemented in UDMS,and the function and performance tests are carried out.The test results show that the small-files merging method based on relevance can effectively improve the storage utilization and query efficiency of HDFS,and the associative query index method can greatly reduce the time of path expression calculation,so as to improve the query efficiency of TOTEM.
Keywords/Search Tags:Manufacturing Unstructured Data, Metadata Modeling, Small-files Merging, Association Query, Path Expression
PDF Full Text Request
Related items