Font Size: a A A

Key Technologies Research And Implementation Of Chinese Text Automatic Classification

Posted on:2014-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z H ZhangFull Text:PDF
GTID:2248330398495281Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development and spread of Internet, electronic text information increases explosively. It is a great challenge for information science and technology that how to organize and manage those massive and clutter text efficiently. The text automatic classification based on artificial intelligence can assign the text to the pre-defined text category according to the text content.In this way, the massive text can be managed efficiently and the users can find the required information quickly. As the basis technical of information filtering, information retrieval, text database, digital library and so on, the text classification can increase the quality of information services and have the spread applied value and research meaning.The paper introduces the key technology of the text automatic classification such as the word segmentation technology, feature dimension reduction, classification technology and so on, mainly put forward on the research of the feature selection technology. As most commonly used methods in the feature dimension reduction, the feature selection method can effectively reduce the dimension of the text represented by the vector space model and remove the redundant features by selecting the most efficient features from the original features space. As a result, the classification effectiveness and the classification accuracy can be improved.The traditional feature selection methods focus on the words having significance of multiple categories which not in accordance with the categories to select feature words and ignore the role of the word frequency in the feature selection. If some words appear intensively in a few categories and uniformly distribute in these classes, then these words have strong class discriminative degree and should be retained. The concept of class discriminative degree is proposed and applied in the feature selection based on this idea. Some improved methods of the traditional feature selection algorithms and a new feature selection algorithm based on class discriminative degree are proposed.In order to verify the feature selection algorithm proposed by the paper, the paper design and develops a Chinese text automatic classification system. Experimental comparison between the tradition feature selection methods and the new methods are done in the system, the experimental results show that the new feature selection methods get more classification accuracy than the traditional feature selection methods which verify the effectiveness and feasibility of the new proposed method.
Keywords/Search Tags:Chinese Text Automatic Classification, Feature Selection, ClassDiscriminative Degree, Class Discriminating Words
PDF Full Text Request
Related items