Font Size: a A A

Design And Implementation Of Enterprise Registration Text Clustering Software System

Posted on:2008-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2178360242967305Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the flood of information on the web, web mining is a new research issue which draws great interest from many communities. Currently, there is no agreement about web mining yet. It needs more discussion among scientists in order to define what it is exactly. Text clustering as a major method of web mining becomes the focus of public attention too. Text clustering goals is to divide into several groups on the document sets for the same group of document with the contents similarity maximum, and the contents similarity of documents between different groups as possible small.In this paper, we briefly review the development and status quo of automatic text clustering; discuss the requirement of document clustering technique and some relevant fields it concerned. In response to the large-scale version sets automatic text clustering system to achieve key technologies, the introduction of the vector space model that file, and use this model to achieve automatic text clustering, and the establishment of vector space model with a detailed analysis and exposition. Then we realize this automatic document clustering system and test it, the result is discussed and analyzed in the end.This paper, we regard k-means algorithms as text clustering algorithms, k-means algorithm is a version suitable for large sets of horizontal division algorithm for clustering. Through analysis of the text sets testing, certification of the clustering effect of the algorithm is satisfactory.
Keywords/Search Tags:Automatic Segmentation, Text Clustering, VSM, Text Feature Extractor
PDF Full Text Request
Related items