Research On Network Text Classification Technique

Posted on:2013-10-09

Degree:Master

Type:Thesis

Country:China

Candidate:L J Yi

Full Text:PDF

GTID:2248330371994464

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Nowadays, due to the development of network technology, the internet has become the main repository for people to obtain information. But it is glutted with all kind of information in the network because of its opening character. In order to enable people to quickly obtain the information which they are interested in from the network, that how to use network text classification technique to manage the clutter of network information and make these information resources orderly, has become increasingly important. Text classification technique is the base of information filtering, search engine field and so on, which makes network text classification technique a research hotspot gradually.This paper introduces the relative theories of network text acquisition and text classification technique, such as HTML language, the Chinese word segmentation, similarity calculation method, weight calculation method, the feature selection and commonly classification method. Then design and implement a network text classification system based on these basic theoretical methods.This paper carried out the following research:in the part of web text extraction, it designed and implemented a page of text extraction by analyzing the characteristics of HTML language and the general page structure. In the Section of text classification, it analyzed the KNN text classification algorithm and the naive Bayesian text classification algorithm in detail, and used a method for implementing text field classification through the text classification algorithm. On the basis of researching naive Bayesian text classification method and aiming at the assumption of independence of this method, it improved the naive Bayesian classification by using a TAN model of Bayesian network. In this method, it considered the correlation between these two attributes and broadened a certain degree on the assumption of independence. The paper proposed a text attitude judge method, through extracting the text sentiment feature, analyzing sentiment words weight, evaluating the text attitude, and using this method to implement the second text classification. It made a test to network text classification method in the end. Through the experimental test by using the corpus, it proved the accuracy of this system; besides, it proved this system’s practicability by a text classification experimental test which got an extraction from the network text.

Keywords/Search Tags:

Web Text Extraction, Chinese Word Segmentation, Feature Selection, Text Classification

PDF Full Text Request

Related items

1	Research And Implementation Of Chinese Automatic Text Classification System Based On SVM
2	Research And Application Of Internet Chinese Text Classification
3	Research On Word Segmentation And Feature Selection Of Chinese Text Chinese Text Classification
4	Research On Core Technology Of The Chinese Text Classification
5	Research And Implementation Of Chinese Text Categorization
6	Design And Implementation Of Web Automatic Text Categorization
7	Chinese Text Data Classification
8	The Research And Application Of Chinese Web Text Classification
9	The Application Of Text Classification In Spam Interception System
10	Research And Implement On The Related Algorithms Of Chinese Text Classification