Font Size: a A A

Research And Implementation Of Non Structured Data Management In Discrete Manufacturing Industry Based On Hadoop

Posted on:2016-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z H ZhaoFull Text:PDF
GTID:2308330479499198Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, discrete manufacturing enterprises attach great importance to the construction of information, after several years of development it has begun to take effect. In a few years of business development, the enterprise internal has produced a lot of information, including office documents, business orders, data reports, images, audios, videos and so on. Enterprises commonly extract useful information and store them in the database, and the original files are saved to disk. To a certain extent, this way has solved the problems of enterprise data management. But with the rapid growth of data volumes, how to manage and use the unstructured data reasonably and effectively has become a big issue. It determines whether to enhance international competitiveness or not.This subject designed and implemented an unstructured data management system, which is based on the very popular hadoop distributed framework. Firstly, as the size of unstructured data file is small in the discrete manufacturing enterprise and the small files need to regularly manually merge, so with the counter of hbase, a distributed database, this paper proposed a strategy that the small files can automatic archiving; Secondly, in order to overcome the problem that can not find the original documents saved in the disk by file content, this system used lucene, a full-text search engine technology toolkit, and designed a full-text-based content retrieval strategy in hadoop; Finally, this unstructured data management strategy is used to a discrete manufacturing enterprise, and to solve the problems that a large number of attachment uploaded, then how the files can store securely and backup well and search efficiently.In the unstructured data management system which is designed and implemented in this paper, used amount of office documents that stored in a discrete manufacturing enterprise for many years as case, on one hand the system solved the question that lots of small files stored and affected the system performance by merging small files automaticlly; on the other hand lucene full-text search engine facilitated the users to search files through the document content retrieval, and improved office efficiency; At last through the integration with the existing enterprise systems, solved the problems that the original system in the management of annex.
Keywords/Search Tags:unstructured data, hadoop, hbase, lucene, small files
PDF Full Text Request
Related items