Font Size: a A A

Research And Improvement Of Automatic Classification Technology For Chinese Text

Posted on:2015-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:H AnFull Text:PDF
GTID:2298330434950164Subject:Information security
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, especially the popularization of network applications, all kinds of information in the form of electronic documents filled with people’s lives, and its scale is also growing rapidly. The traditional manual way to organize and manage information is time-consuming, costly, and incapable of action in the face of large-scale data, so the scientific organization and management to massive information has become an important issue. As an important research topic in this field, automatic text classification technology has great application value in information retrieval, data mining and so on, it has received wide attention and development in recent years. The text automatic classification on the Chinese, which is our main record form of information, is solved urgently. With the development of statistiacal learning theory and natural language processing technology,text classification technology has made a lot of research and practice achievements.Text classification technology has four development trends, including the emergence of new classification methods, the improvement of traditional classification methods, the appearance of new application mode and field and the transformation of theoretical achievements.This paper mainly includes the following sections:(1)Did a summarize on the background, current situation, trends of research of Chinese text automatic classification technology. Did a detailed introduction and comparative research in several key technologies of Chinese text automatic classification, such as text preprocessing, feature selection, text representation, classification algorithm and so on. It also introduced the category system of text classification and the evaluation system of classification results. Pre-processing part included the parsing and extraction of web content, Chinese word segmentation, feature selection. Classification algorithm introduced the SVM, KNN, NB classification algorithm, and this part also analyzed the difference of these classification algorithms and presented a different application scenarios.(2)Based on the systematic analysis in the relevant theory and key technologies of Chinese text classification, this paper designed a complete Chinese text automatic classification scheme based on B/S.And did a needs analysis, functional analysis, overall design and detailed module design of this system.(3)Implemented a Chinese text categorization system based on the scheme. This system includes an corpus acquisition module, corpus processing module, training and classification module, the user interaction module. Meanwhile, the system maintains a web page URL classification database,to provide the function of directly query page classification for the users,thus,reducing the waiting time of users.(4)Did an comparative experiment on different feature selection methods, different dictionary length, different classification algorithms, different number of samples experiments, and the results were analyzed to improve the classification results.Future work will be in performance optimization for the flow and method of text classification.
Keywords/Search Tags:text classification, Chinese segmentation, feature selection, classification algorithm, classification scheme
PDF Full Text Request
Related items