| As a very important branch in the field of data mining, Text clustering technology can classify the text information on the web effectively, it can not only help us to find useful information from the vast amounts of network information, but also improve the quality of service of network.In this article, the research is based on the web of Chinese text clustering. By means of the text clustering, the similar tests on the web can be classified. Because the Chinese text is usually composed of Chinese words or a word as a unit of the continuous statement, it is not like English text which uses the blank space as a boundary mark, before the clustering of Chinese text, the whole sentence need to be divided into small vocabulary unit. In addition, part of the test which is not the key words need to be removed, retaining the important part which can represent the text content.However, text clustering algorithms cannot be directly handled in the original Chinese text form, because the text content is used by human natural language, belonging to the unstructured text, and the computer is hard to deal with its semantic. Structured text processing is to translate the text into a model that the computer can understand. According to the characteristics of the text and text processing requirements, the appropriate text representation model is selected. In this article we use vector space model (VSM), because the VSM is said to context as characteristic vector and weight set, clustering operation is transforms the vector operations in vector space. At present there are many ways to text information which is converted to vector, and here we will choose classic feature weight calculation method based on vector space frequency-inverse document frequency (TF-IDF) algorithm on Chinese text structured processing, because TF-IDF is depicting the characteristics in the distribution of important degree of the whole text set.Although through vector transform, the text can be used in computer processing, but the text in the collection is composed of a large number of features, meanwhile it often has a high dimension, and will affect the effect of text clustering. Their respective text vector may exist in different vector space, making it difficult to calculate similarity. So we need to build the text clustering from the original feature space to another mapping characteristics of low dimension space.At this time, the characteristics should be optimized. Latent semantic analysis (LSA) of the singular value decomposition (SVD) not only can map the non-orthogonal multidimensional feature vector space model to the dimension of a few latent semantic space, but also can keep the original basic semantic features of the space, so as to realize the feature space dimensionality of noise reduction processing.Through the SVD of text we can use clustering algorithm for clustering. The current clustering algorithms can be divided into four different ways:the division method, the hierarchy method, the density method and the grid method. In these clustering methods, this paper chooses the Ordering points to identify the clustering structure (OPTICS) clustering algorithm based on density method. for the reason that compared with other clustering methods, this method can find different shapes of text cluster, and it can also filter outliers, and the web text clustering effect is better. Finally in cluster through single parameters exponential smoothing method to deal with the clustering results, the clustering results are more accurate. Through the experiment, this method is suitable for web text clustering analysis. |