Font Size: a A A

Research And Application Of Subject-oriented Document Resource Clustering

Posted on:2012-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:H Y CuiFull Text:PDF
GTID:2178330335968250Subject:Education Technology
Abstract/Summary:PDF Full Text Request
Along with the explosive growth of text message, text clustering of text information processing has become an important means of the field, and been widely used in knowledge discovery, information retrieval, bioinformatics and other fields. Text clustering is the use of unsupervised machine learning method which automatically identify text category, facilitate users to select type of useful knowledge, and conducive to similar knowledge and relevant knowledge for the next step of knowledge integration.Take Educational Technology as example, we construct ontology library as a text clustering system data source for the realization of the literature clustering system, which optimize Lingo clustering algorithm to obtain a better clustering effect. The main work includes:(1)The text analysis the theory of clustering, major describes the present situation of research on text clustering technology, introduces the main cluster algorithm and the classical clustering system.(2)The thesis introduces the subject domain ontology library construction method. In this paper, the ontology library contains concept table and relationship table form eight core textbooks and recent educational technology professional academic journals. The set of concept is found in the field terms and the relationship between the concept (including the synonymous relationship between upper and lower part and the whole relationship) is indicated.(3)Design and implement the resource-oriented clustering system, which mainly includes three parts (the text pre-processing module, text clustering algorithm module and the clustering results visualization module). Finally compare the system with traditional clustering algorithms through the experiment.(4)Introduce the application of the clustering system in information retrieval and knowledge fusion.The special feature of this article:(1)Describe the construction method of educational technology ontology library.(2)Optimize Lingo clustering algorithm. The algorithm after optimized merge the synonyms based on the concept in the ontology repository to reduce the dimensions of the term-document matrix. Punish the labels extracted using subject keywords for more standardized.(3) for high similarity of documents in the same category, auto-discovery and merge of the same or similar knowledge element based on topic map, which achieve the purpose of knowledge fusion between documents.
Keywords/Search Tags:text clustering, singular value decomposition, field ontology, Education Technology, knowledge consolidation
PDF Full Text Request
Related items