Font Size: a A A

Research And Design On Livelihood Information Classification System

Posted on:2012-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:S YangFull Text:PDF
GTID:2218330368488430Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Rapid development of network technology led to the explosive growth of the network test information resources. Network has gradually become an important means for people to get routine information. Network livelihood information classified ads emerged in this context, and gradually replace newspapers and magazines and other paper media to be an important way for people to get livelihood information classified ads. There are fewer livelihood information classified ads in those young web site, sometimes they need to capture the information from network to enrich their content. However, classified the vast text information only depend on the staff is a time-consuming work. In this context, the paper constructs an automatic text classification system for livelihood information classified ads, and research some key technologies of this system.This paper studied the text automatic classification technique and made some experiments. Discussed text preprocessing, feature selection, classification algorithms and performance evaluation algorithm in detail. The created a automatic text classification system. Experimental data is livelihood information classified ads which are collected from network by web crawler. The goal of this paper is to achieve automatic classification for such experimental data, and all of the experiments in this paper are based on them. Main work of the paper are as follows.(1) Improved chi-square statistic algorithm, and proved it has a good effect by experiment. Give a suggestion that how many characters should be reserved.(2) Choose SVM algorithm as classification algorithm in this paper, discussed the principle of SVM in detail, and introduced the method of parameter optimization, and determined the best type of kernel function in the process of classification according to the experiment.(3) Analysis the multi- classification methods of SVM, determined a hierarchical classification method.
Keywords/Search Tags:text automatic classification, text preprocessing, feature selection, chi-square, statistic, support vector machine
PDF Full Text Request
Related items