Font Size: a A A

Research On Network Text Classification Technique

Posted on:2013-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:L J YiFull Text:PDF
GTID:2248330371994464Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Nowadays, due to the development of network technology, the internet has become the main repository for people to obtain information. But it is glutted with all kind of information in the network because of its opening character. In order to enable people to quickly obtain the information which they are interested in from the network, that how to use network text classification technique to manage the clutter of network information and make these information resources orderly, has become increasingly important. Text classification technique is the base of information filtering, search engine field and so on, which makes network text classification technique a research hotspot gradually.This paper introduces the relative theories of network text acquisition and text classification technique, such as HTML language, the Chinese word segmentation, similarity calculation method, weight calculation method, the feature selection and commonly classification method. Then design and implement a network text classification system based on these basic theoretical methods.This paper carried out the following research:in the part of web text extraction, it designed and implemented a page of text extraction by analyzing the characteristics of HTML language and the general page structure. In the Section of text classification, it analyzed the KNN text classification algorithm and the naive Bayesian text classification algorithm in detail, and used a method for implementing text field classification through the text classification algorithm. On the basis of researching naive Bayesian text classification method and aiming at the assumption of independence of this method, it improved the naive Bayesian classification by using a TAN model of Bayesian network. In this method, it considered the correlation between these two attributes and broadened a certain degree on the assumption of independence. The paper proposed a text attitude judge method, through extracting the text sentiment feature, analyzing sentiment words weight, evaluating the text attitude, and using this method to implement the second text classification. It made a test to network text classification method in the end. Through the experimental test by using the corpus, it proved the accuracy of this system; besides, it proved this system’s practicability by a text classification experimental test which got an extraction from the network text.
Keywords/Search Tags:Web Text Extraction, Chinese Word Segmentation, Feature Selection, Text Classification
PDF Full Text Request
Related items