Font Size: a A A

Research On A Chinese Text Clustering Method

Posted on:2004-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:L P LiuFull Text:PDF
GTID:2168360122470204Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, tremendous volumes of text documents have become available on the Internet, digital libraries, news sources and company-wide intranets. This has led to an increased interest in developing methods that can help users to effectively navigate, summarize and organize this information. Fast and high-quality document clustering algorithms play an important role towards this goal as they have been shown to provide both an navigation/browsing mechanism by organizing large amounts of information into a small number of meaningful clusters as well as to greatly improve the retrieval performance either via cluster-driven dimensionality reduction or term-weighting. Now text clustering is one of most important topics in data mining. The research of Chinese text clustering is at its early stage, and there exist many problems that we will study in this paper.First, we present the background and current development of text clustering and discuss the proposal, content and target of our research.Second, we design a Chinese text clustering model CTCM and research main aspects of CTCM such as feature presentation, feature extraction, the adjust of feature vector and clustering algorithm.Third, we lay emphasis on the study of text clustering algorithm. Based on the careful analysis of present clustering algorithm, we give two text clustering algorithms: EK( Exact K-means algorithm ) and DBTC(density-based Text Clustering ), and discuss the results of clustering experiments.Finally, we introduce an application of Chinese text clustering: give the design of an Email Classifying and filtering system (ECFS).The main results of our paper are as follows: present a Chinese text clustering model; obtain a Chinese text algorithm which can select better initial point and a DBTC clustering algorithm which can identify cluster with any shape.
Keywords/Search Tags:Data Mining, Chinese text clustering, feature extraction, email classification, email filter.
PDF Full Text Request
Related items