Font Size: a A A

Massive Unstructured Knowledge Management System Based On Text Clustering Technology Under Distributed Computing Environment

Posted on:2013-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:R W FengFull Text:PDF
GTID:2298330422979905Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
During the construction of the enterprise informatization, the scale of unstructured knowledge isbecoming larger and larger. The retrieval capacity of current knowledge management systemdeclines sharply. So how to manage and retrieve massive unstructured knowledge with highefficiency is becoming an urgent task to handle. Hence, the technique of index construction anddistributed retrieval based on text clustering is deeply researched in this thesis, and the text clusteringof massive knowledge under the distributed computing environment is realized. The index isconstructed in parallel on this basis. Then indexing the index selection algorithm based on queryspace is mixed with traditional index selection algorithm to provide a highly efficient knowledgeretrieval.The main work of this thesis includes:(1) After analyzing the situation and problems of present researches about unstructured knowledgemanagement, the framework of unstructured knowledge management system based on textclustering is constructed, and systematic work procedure is designed.(2) The key techniques of system include the construction of the distributed index based on textclustering and knowledge retrieval based on distributed index. In the process of indexconstruction, text clustering technology is introduced to divide index. In thesis, parallel clusteringalgorithm under the distributed computing environment is designed, based on which thedistributed index is constructed in parallel with Lucene. In the process of retrieval, the indexselection algorithm upon the query space is discussed. And combining with traditional indexselection algorithm, a mixed index selection algorithm is proposed. Then an advanced Luceneglobal retrieval algorithm is designed at last.(3) The key techniques are applied into the project of fluid piping knowledge management of aHelicopter Institute. The design and development of the system are completed, and highlyefficient management of unstructured knowledge is realized. The experiments and applicationsresults revealed that the proposed massive unstructured knowledge management based on textclustering under the distributed circumstance can efficiently deal with large-scale unstructuredknowledge, especially in improving the capacity of knowledge retrieval dramatically.
Keywords/Search Tags:Massive unstructured knowledge, Distributed computing environment, Distributed index, Text clustering, Distributed retrieval, Lucene
PDF Full Text Request
Related items