Research And Implementation Of Key Technologies On Web Text Classification

Posted on:2013-12-06

Degree:Master

Type:Thesis

Country:China

Candidate:H Liu

Full Text:PDF

GTID:2248330395455641

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Nowadays, the world is filled with all kinds of information; the Web Text which exists inthe electronic form has gradually become the most important source of people’s information.However, the Web Text is unorganized and dramatic, and the web page is far more complexthan the text documents. So, recently, the problem how to obtain the information which isrequired and useful from the Internet with an efficient and rapid method has become a maintopic of the scientific field. And based on the requirement, a new technique which is calledWeb Text Mining has formed. This technique contains four aspects: web text classification,web text clustering, information extraction, information retrieval. This paper mainly discussesthe web text classification.In the field of web text classification, the support vector machine (SVM) has been widelyused. This theory which is based on the statistical learning theory and the structural riskminimization rule is a kind of machine learning method. Compared with the conventionalmachine learning method, the SVM has a strong ability of Generalization and the globaloptimal solution can be obtained.Besides, it avoids some problems, such as over learning,curse of dimensionality and local extremum. Because of the above advantages, it has becomea hotspot of the scientific field. However, as a new theory, the SVM still has more researchesand improvements to be done. In all of them, the classification of mass data set and how toclassify after the data set is updated have become the key and difficulty of the research.This paper firstly deals with the web text mining, and analyzes its key techniques.Secondly, the basic concepts and related theory of the statistical learning theory and the SVMhave been discussed. In addition, because the SVM has many defects in classifies mass dataset, such as taking up higher memory, slow convergence speed and ignoring the previouslearning result, an improved algorithm has been proposed to solve multi-class problem. Thisalgorithm combines the SVM and Incremental learning together. After the data set is updated,it reserves the result of previous learning, and only classifies the new data. Thus, a consequentlearning process is formed. Last, the improved algorithm is used in the system of Web TextMining, getting a better classification result.

Keywords/Search Tags:

Web text mining, Support vector machines, Multi-class problem, Incremental learning

PDF Full Text Request

Related items

1	Study On Text Classification Based On Multi-class Support Vector Machines
2	Study On The Incremental Learning Algorithms For Support Vector Machines
3	Triplet Support Vector Machines For Pattern Classification
4	Study On Multi-class Text Classification Based On Support Vector Machines
5	Incremental Learning Algorithm For Muti-class Support Vector Machine And Application In Cognitive Radio System
6	The Inlfuence Of The Data Distribution Over Support Vector Machines
7	The Optimatimal Algorithm Studies Of Support Vector Machines And Classification Questions
8	Research On Incremental Learning Algorithm Of Support Vector Machine
9	Research On Text Classification Based On Multi-class Soft Interval Support Vector Machines
10	Anomaly Detection Research For Imbalanced Classes