Font Size: a A A

Chinese Text Clustering Based On Text Similarity

Posted on:2010-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y M LiFull Text:PDF
GTID:2178360275450332Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Text clustering is one of the central research topics in text mining and information retrieval area.Chinese word segmentation,feature selection and similarity calculating are key techniques in Chinese text clustering.Some related techniques and methods of text clustering are summarized in this paper.Chinese out-of-vocabulary(OOV) words and segmentation ambiguity resolution are studied.Several commonly used feature selection methods and feature extraction methods are compared to choose the best one. TF-IDF is improved,too.Clustering methods and the evaluation methods are also summarized and analyzed,and the text clustering accuracy are improved by using of text similarity matrix,especially when the text set is small.At last,the key techniques of multi-document automatic summarization are deeply researched and analyzed,as wall as application in text clustering.The study on Chinese text clustering in this paper is useful for the specific applications such as text mining,information retrieval and so on.
Keywords/Search Tags:text clustering, OOV words identification, feature selection, text similarity, multi-document automatic summarization
PDF Full Text Request
Related items