Font Size: a A A

Unstructured Document Management And Analysis Based On Distributed Storage

Posted on:2018-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:L PengFull Text:PDF
GTID:2428330569975079Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of the informatization of university scientific research management in our country,all kinds of unstructured scientific research documents get fast growth and accumulation.In the face of scientific research documents with huge scale and wide variety,how to provide them with centralized,safe and reliable storage environment and intelligent management platform to improve the efficiency of scientific research documents management has become a very urgent problem.Otherwise,scientific research documents are long-term research accumulated knowledge crystallization of researchers,how to play the potential value effectively which hidden in the documents,has become the focus of the present scientific research management requirements.Therefor this paper designs a document management and analysis platform based on distributed storage system and combined with the theory and technology about large data of natural language processing and machine learning,which is safe,reliable and extensible.Firstly,the paper conducts a full research for the present status of unstructured data management and analysis at home and abroad,analysises the deficiencies of them,also teases out the requirements of the documents management and analysis platform combined with the actual situation of scientific research document data.Next based on distributed storage system and some Web development frameworks such as SpringMVC and ExtJS,it detailed designs and implements the functions,including huge documents storage,documents management and unstructured documents analysis.Finally,the paper takes a comprehensive testing and analysis to verify the accuracy and reliability of each module.Scientific research documents management and analysis platform has high maintainability and extendibility.It provides a guarantee for the security of documents storage by building a highly available distributed cluster storage environment,greatly improves the efficiency of scientific research management by providing a simple and intuitive interface,provides more intelligent and personalized data services for the majority of researchers by the extraction of implicit,previously unknown and potentially valuable information from huge unstructured documents.It has a good decision-making effect and practical value in the process of scientific research management information construction process.
Keywords/Search Tags:Unstructured, Distributed storage, Document management, Document analysis
PDF Full Text Request
Related items