Based On Artificial Immune Mechanisms Of Web Text Classification

Posted on:2008-05-17

Degree:Master

Type:Thesis

Country:China

Candidate:Y Du

Full Text:PDF

GTID:2208360212999954

Subject:Information security

Abstract/Summary:

PDF Full Text Request

With the fast development of Internet, searching in Internet is becoming the important way for people to get useful information. However, facing with various, different formed and exponential increasing web information, how to get valid knowledge and information from the web quickly is becoming the recently emerging research field to solve this problem. And this is also the final thesis's research direction. Automatically webpage classification is not only an effective way to get useful information from website, but also the basis and precondition to do further content analysis and information filtering.As an important branch of the artificial intelligence, artificial immune system has attracted many more scholars'attentions as much as Neural Network and Genetic Algorithms. And as for theories in AIS, such as the Clonal selection, hypermutation and etc, they have dynamical, self-adaptation and self-study characteristics, which is suitable to introduce into the classifier training process in automatically text classification. So, it is one of innovations in this paper that using artificial immune system's relative theory into automatically text classification model. Additionally, the design of pretreatment module and feature extraction module for HTML webpage specially is another innovation in this paper.Research works in this paper focus on the excellent properties and algorithms of AIS and the main technology in automatically webpage classification. The main research is concluded as follows:Firstly, Summarize and analyze the special concepts, basic immune methods, immune algorithms and immune model for machine learning in AIS, expatiate RLAIS immune model and its critical algorithm emphatically, and analyze its advantage and disadvantage. Summarize and analyze the key technology in webpage classification. Analyze feature extraction algorithms emphatically, and summarize each ones advantage, disadvantage, Time Complexity and Space Complexity.Secondly, propose an approach to find the main context in HTML webpage according to the relaxed tag matching and tag mixture in HTML rule. And accomplish this approach.Thirdly, modify the TFIDF method to calculate the feature word's weight through introducing word position factor into feature extraction module.Finally, design CSC classifier in CSC system model, which is based on AIRS model and use Vector Space Model's method to present vector. Analyze such classifier's performance through experiment.

Keywords/Search Tags:

Webpage pretreatment, Clonal mutation, Resource competition, Clonal Selection Classifier

PDF Full Text Request

Related items

1	The Application Research Of Immune Clonal Algorithm Oriented Colour Dynamic Matching
2	Research Of Clonal Selection Algorithm Focused On Customer Relationship Mining
3	Selecting Fuzzy Rules Using Clonal Algorithms And Its Application Study
4	Research On Clonal Selection Algorithm And Its Applications For Classification In Data Mining
5	Band Selection Algorithm Of Hyperspectral Image Based On Clonal Selection Algorithm
6	Quantum Clonal Evolutionary Algorithms
7	Research Of Detection Algorithm Based On Clonal Selection And Detectors Distribution
8	Network Intrusion Detection Based On Immune Clonal Selection Weighted Naive Bayesian Classifier
9	Study On Detector Optimization In Clonal Selection Algorithm
10	Immune Clonal Strategy Algorithms And Their Application