Font Size: a A A

A Document Clustering Algorithm Based On Immune Networks And Its Application

Posted on:2010-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2178360275468187Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development and maturity of information technology especially the internet-related technology,a large number of text information begin to exist in the form of computer-readable,and the information continue to increase today.Internet has become a huge source of information.At the time of large information,how to obtain the user's interest by analysing the web text viewed by user so as to provide user with the initiative personal services has become more and more meaningful.Document clustering technology is one of the core technology in web text mining.Through the study of document clustering at home and abroad,this paper presents a document clustering algorithm based on immune network,mainly to study the followingaspects:●In the clustering algorithm,we study clustering algorithm based on the immune network.The experiment proves that the algorithm has fast convergence and doesn't rely on the choice of the initial prototype,but the algorithm have too many parameters and the setting of traditional parameters is determined by experience and experiment,which lead to heavy work load and make it difficult to obtain the optimal combination of the parameters,at the same time this problem affects the performance of the algorithm.So we propose a method of parameter optimization based on the theory of the genetic algorithm and chaos. Simulation results show that this method do optimize the parameters of clustering algorithm based on the immune network effectively.●In user interests mining,we present a new user interests mining model based on the model of KHN,this model can not only find the user's interests,but also detect the user's real-time interests.●In document feature expressing,we introduce a method of keyword extraction based on Small World Networks and the semantic similarity calculation based on the"HowNet",reducing the text vector space dimension,making the documents with the category characteristics and enhancing similarity between the documents which are in one category.●In the document clustering algorithm,on the grounds of the theory of artificial immune system,we present a document clustering algorithm based on immune network,which include the affinity calculations of antibody-antibody and antibody-antigen,in this algorithm,the antigen(web text) stimulates the immune network continually,through a series of operations such as clonal selection, mutation,etc.at last the algorithm leads to antibodies with a type of group characteristics and the antigen which is attached to the antibody in order to obtain the goal of clustering. In order to verify the efficiency of the algorithm,the algorithm is applied in mining the web users' interests and a prototype system is developed.The experiments show that the method of document expressing has higher data compression rate and let the texts more have features,the proposed document clustering algorithm has not only better clustering quality and adaptive ability,but also it is unrelated to initial distribution of documents, when the algorithm is used to mine users' interests,ultimately it can discover users' interests.
Keywords/Search Tags:Document Clustering, Immune Network, Small World Network, Semantic Calculation, Interests Mining
PDF Full Text Request
Related items