Research Of Chinese Text Classification Based On Mixed Feature

Posted on:2013-12-12

Degree:Master

Type:Thesis

Country:China

Candidate:Z F Lin

Full Text:PDF

GTID:2298330467478732

Subject:Control engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Information technology and the arrival of we-media era, more and more information exist in the Internet by way of electronic text. To extract accurate and valuable knowledge from the massive Web text information has become a major goal of information processing. As a research hotspot in the field of information processing technology, automatic text classification can process massive text, solve the problem of disordered information commendably. And as the technical basis for the field of information retrieval, information filtering and search engine, automatic text classification technology has broad application prospects.The application background of this page is topic information retrieval in the field of vertical search. To achieve efficient topic classfication is the main tasks of this system. For the more demanding of Web content direct performance in the vertical search, we developed a Chinese text classification based on mixed feature. Solve the weak direct problem of traditional Web text classification results. The research mainly concerned on Web text extraction, mixed feature modeling and classification strategies.The Web text is extracted by an extractor. The ads, image and hyperlinks in the Web pages brought great trouble for Web text classification. This Web text extractor enable the Web page to become more pure which only contains text content.The vector space model is established by mixed feature. The mixed feature consists of term feature and Web feature. Term feature is selected throuth natural language processing and feature dimension reduction, decided with the improved term weight algorithm. The classification performance of the improved term weight algorithm is verified by the corresponding experiments. The Web feature set consists of pagesâ€™ linguistic characteristics and network characteristics. We achieve the Web feature modeling through statistics and normalization.The thinking of machine learning is introduced to train classifiers. We study the support vector machine and optimize the parameters in order to reach a better recognition performance of topic classifier and Web filter. A Chinese text classification system is proposed and implemented in this paper. The system cascades topic classifier and Web filter. System firstly fetches the Web resources from the Internet and extracting the text information, then establishes mixed feature set and build the system based on the feature. Finally through the experimental, we verify the system has higher classification accuracy and strong Web page filtration capacity.

Keywords/Search Tags:

text classification, term weight algorithm, mixed feature, support vector machine

PDF Full Text Request

Related items

1	Term Weight-Based Chinese Text Classification Algorithm
2	Research On Text Classification Based-on Support Vector Machine
3	Designed And Implementation Of Chinese Text Categorization System Based On Support Vector Machine
4	Research On Text Classification Algorithm Based On Support Vector Machine And Neural Network
5	Research Of Automatic Text Classification Method Based On Machine Learning
6	The Research And Application In The Stock Market News Of Feature Selection And SVM Algorithm
7	Text Sentiment Analysis Based On Text Classification
8	The Design And Implementation Of Text Classification System Based On SVM-KNN
9	Support Vector Machine And Its Applications
10	Research On Text Classification System Based On Support Vector Machine