Research On High-end Manufacturing Unstructured Data Management

Posted on:2019-05-29

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Li

Full Text:PDF

GTID:2428330545986957

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

German industrial 4.0 has launched a new round of industrial revolution,pushing the manufacturing industry into the era of big data whose characteristic is intelligence.Unstructured data,as one of the important components of manufacturing big data,contains rich application value and has an important support for intelligent manufacturing.Therefore,how to manage manufacturing unstructured data efficiently has become a research hotspot.Manufacturing unstructured data have characteristics like various formats,complex relationship and the number of small-files is high and others.Because unstructured data can't be analyzed directly,existing unstructured data management system UDMS extracts the semantic features from unstructured data as metadata,unstructured data are managed by HDFS and metadata are managed by object deputy database TOTEM.However,this system still has the following problems in managing manufacturing unstructured data:1)Lack of a data model that can reflect the characteristics of manufacturing,it is difficult to organize and manage metadata effectively.2)The storage of a large number of small-files put pressure on HDFS.3)The efficiency of query in TOTEM is low.In order to better manage the manufacturing unstructured data,this paper studies the following aspects:First,based on the characteristics of manufacturing industry,we propose metadata modeling method which includes feature modeling and relationship modeling.Our method describes the features of data and the complex relationship between them by object deputy model.The features are divided into basic features,physical features and semantic features.The relationships include the inclusion relationship between documents and entities,the composition relationships and constraints among entities,and the sequence relationships among documents.Then,we propose small-files merging method based on relevance to solve the problem of small-files storage based on HDFS.The core is to cluster small-files through comprehensive consideration of file relevance and storage space utilization,a cluster of small-files are merged into a large file and the large file is stored in HDFS.In addition,the operations of small-files such as addition,deletion,modification and query to support small-flies merging are designed.Next,because the efficiency of cross-class query in TOTEM is low,we propose associative query index method to increase the efficiency of path expression computation which is the core of cross-class query.This method uses inverted index to store the deputy relationship between files,and can support the computation of path expressions with predicates.In addition,a batch maintenance method is proposed to maintain the index.Finally,the above modeling and optimization methods all are implemented in UDMS,and the function and performance tests are carried out.The test results show that the small-files merging method based on relevance can effectively improve the storage utilization and query efficiency of HDFS,and the associative query index method can greatly reduce the time of path expression calculation,so as to improve the query efficiency of TOTEM.

Keywords/Search Tags:

Manufacturing Unstructured Data, Metadata Modeling, Small-files Merging, Association Query, Path Expression

PDF Full Text Request

Related items

1	Research And Implementation Of Non Structured Data Management In Discrete Manufacturing Industry Based On Hadoop
2	Research And Implementation Of Small File Storage Model Based On HDFS
3	Research And Design Of Massive Small Files Merging Based On Hadoop
4	Design And Implementation Of Metadata Management In Distributed Small Object Storage System
5	Research And Implementation For XML Query Optimization Technology Based On Regular Path Expression
6	The Research Of HDFS Optimization Towards Lots Of Small Files Accessing And Storage
7	Key Technology Research And System Implementation Of Distributed File System Adapted To Massive Small Files
8	A Strategy To Deal With Massive Small Files In Hadoop Distributed File Systems
9	Scalable Design Of The Metadata-based Database
10	Design And Implementation Of A Distributed Storage Of Small Files Performance Optimization Strategies