Font Size: a A A

Research Of An Improvement Chinese Text Clustering Algorithm Based On Concept

Posted on:2007-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:S F ZhuangFull Text:PDF
GTID:2178360182973330Subject:Computer applications
Abstract/Summary:PDF Full Text Request
Traditional text retrieving techniques are facing big challenges in the emerging field of large scale textual data processing. In this information exploding era, the web searching engines always return us tens of thousands of relevant records. Because that most information is textual data, users need instruments to rank these texts by their degree of importance and relevance, or to find the patterns and tendency in these texts. So, with the flood of information on the Web, Web text mining is a new research issue which draws great interest from many communities. It has a bright future in the field of data mining. Text clustering, which is known as the significant technique of Web text mining, has been applied in Information Management, Search Engine. The realization of concept-based text clustering takes place in the specified range of concepts, absorbing concept mapping, word sense disambiguation strategy, introducing the traditional theoretical core of text-clustering. HowNet is one emerging Chinese knowledge system which has a more complete knowledge and the concept expression system. The paper discusses the principle of Web text mining, and focus on concept-based Chinese text clustering technique. The paper includes following contents: 1. Discusses the building of Web text mining system in detail and the primary implementation, and presents our implication, the function of system; 2. This paper introduces HowNet knowledge system on the studying of Chinese text clustering based on concept. Using HowNet's complete knowledge system to construct Concept Dictionary and Concept Hierarchy, we realized a kind of Chinese text clustering algorithm based on concept. Finally, the experiment verified the good performance of the algorithms. 3. Because of the relatively bad expansibility of Hownet and the defect in the process of clustering analysis, this paper propose and implements dynamic Chinese text clustering algorithm based on concept. It brings in classification identification of training text and Rough set theory, and then reconstructs part of concept set for reforming concept-based vector space. At last, the experiment verifies the accuracy of text clustering has been improved. 4. The paper brings in formal conceptual analysis technique in the process of text clusters, in order to build its concept lattice, which can improve the readability of text clusters.
Keywords/Search Tags:Web Text Mining, Text Clustering, Concept Space, Rough Set, Formal Conceptual Analysis.
PDF Full Text Request
Related items