Font Size: a A A

The Research On Methods Of Text Mining Based On AiNet

Posted on:2009-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:J N XuFull Text:PDF
GTID:2178360242992793Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and its world-wide popularity and application, it stored large quantities of network information resources and its number is increasing. In particular, text data, to which the Web pages are taken as the carrier, are primary information source and become more and more popular, with which the number is growing explosively. The problem of how to discover knowledge quickly and effectively in a huge number of text messages requires resolving urgently. In recent years, the text data mining has become a key research topic, in which the research on text clustering has generated comprehensive attention.This paper firstly has introduced the disquisitive background and current status of development on text mining, and discussed the relevant technique to text clustering algorithm in text mining, bewritten and analysed the key technology closely related to text clustering, including text expressive model, features extraction, drop-dimension of features vector and the calculation of text similarity and, in succession, the research on text clustering algorithm is outspreaded on the above basis.Achieving the dynamic adaptability of text clustering algorithm is one of the important direction. Based on the study of artificial immune network (aiNet) model, combined with the characteristics of the text clustering, this paperhas improved the aiNet algorithm and realized a text clustering algorithm based on aiNet, with which a new solutions is provided to achieve the dynamic text clustering.In order to overcome the shortcomings of performance-drop of text clustering algorithm based-on aiNet in dealing with high-dimensional data,this paper, has studied the immune genetic mechanisms and k-means clustering algorithm, and introduced the genetic immune into k-means algorithm to optimize cluster center, put forward a text clustering algorithm based on the Immune Genetic Algorithm and K-means called IGAK for short, this algorithm effectively avoid the shortcoming that the classic k-means algorithm is vulnerable to undue influence of unapt initial cluster centres and plunges into a local optimum prematurely.On the basis of IGAK, this paper has designed an text expressive model which is based on cluster centers with virtual coordinate mapping mechanism, to drop text vector dimensions technology. With the virtual coordinates model, the concept such as antibodies, antigens, affinity, similarity has been listed, and a two-stage text clustering algorithm based on Immune Genetic Algorithm K-means And aiNet called IGAK-aiNet for short.Finally, a simple text clustering model based on the new algorithm has been designed and implemented including the main module, necessary data structure and part of the code has been designed. The compared experiments were carried out with the application of the relevant text data, and, the experimental results show that the new algorithm has strong dynamic adaptability, and it has improved the quality of cluster results.
Keywords/Search Tags:Text Clustering, Vector Space Model, K-means, Artificial Immune Network (aiNet)
PDF Full Text Request
Related items