Font Size: a A A

Based On Artificial Immune Mechanisms Of Web Text Classification

Posted on:2008-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y DuFull Text:PDF
GTID:2208360212999954Subject:Information security
Abstract/Summary:PDF Full Text Request
With the fast development of Internet, searching in Internet is becoming the important way for people to get useful information. However, facing with various, different formed and exponential increasing web information, how to get valid knowledge and information from the web quickly is becoming the recently emerging research field to solve this problem. And this is also the final thesis's research direction. Automatically webpage classification is not only an effective way to get useful information from website, but also the basis and precondition to do further content analysis and information filtering.As an important branch of the artificial intelligence, artificial immune system has attracted many more scholars'attentions as much as Neural Network and Genetic Algorithms. And as for theories in AIS, such as the Clonal selection, hypermutation and etc, they have dynamical, self-adaptation and self-study characteristics, which is suitable to introduce into the classifier training process in automatically text classification. So, it is one of innovations in this paper that using artificial immune system's relative theory into automatically text classification model. Additionally, the design of pretreatment module and feature extraction module for HTML webpage specially is another innovation in this paper.Research works in this paper focus on the excellent properties and algorithms of AIS and the main technology in automatically webpage classification. The main research is concluded as follows:Firstly, Summarize and analyze the special concepts, basic immune methods, immune algorithms and immune model for machine learning in AIS, expatiate RLAIS immune model and its critical algorithm emphatically, and analyze its advantage and disadvantage. Summarize and analyze the key technology in webpage classification. Analyze feature extraction algorithms emphatically, and summarize each ones advantage, disadvantage, Time Complexity and Space Complexity.Secondly, propose an approach to find the main context in HTML webpage according to the relaxed tag matching and tag mixture in HTML rule. And accomplish this approach.Thirdly, modify the TFIDF method to calculate the feature word's weight through introducing word position factor into feature extraction module.Finally, design CSC classifier in CSC system model, which is based on AIRS model and use Vector Space Model's method to present vector. Analyze such classifier's performance through experiment.
Keywords/Search Tags:Webpage pretreatment, Clonal mutation, Resource competition, Clonal Selection Classifier
PDF Full Text Request
Related items