Font Size: a A A

Research And Application Of Automatic Categorization Of The Chinese Documents

Posted on:2013-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:J J FanFull Text:PDF
GTID:2248330395486417Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer networks and information, and a substantial increase of information and accumulation, either the PC or on the internet, a huge amount of information stored as text。For these text data how to efficient management, storage, access and extract the required information is the important problems of modern society|of people improving the efficiency of work and life,and the difficultlies of computers, artificial intelligence and information processing research. The basic tools to deal with this problem-the automatic text classification, in recent years has been unprecedented attention and development.Up to now,_for automatic text categorization,the people has been not only very rich and in-depth research,but also the other hot areas of information extraction, search engines at home and abroad. In industry or research institutes, have made a lot of remarkable achievements and developed many useful tools and software systems.The main Chinese text automatic classification of the key technologies of this paper designed and implemented a prototype system. First of all, this thesis introduces the automatic text classification technology Research and theoretical basis.Secondly the Chinese text classification techniques are discussed in detail, and analyzes the technical advantages and features of the vector space model and the automatic Chinese word segmentation. Then it detailes a study of the key technologies of text categorization, including the term weight, feature selection and key algorithms. On this solid foundation,it designs a Chinese text classification automatic classification system, and one of the key technologies describs in detail. Finally,it has the system analysis of the experiments and the efficiency of impact assessment.
Keywords/Search Tags:Chinese word segmentation, Vector space model, Feature selection, The standard deviation of the weights, KNN
PDF Full Text Request
Related items