Research And Application Of Enterprise Information Automatic Classification System Based On Text Mining

Posted on:2017-05-02

Degree:Master

Type:Thesis

Country:China

Candidate:Z Y Wu

Full Text:PDF

GTID:2308330485469647

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet, enterprises are facing problems that collecting and processing a lot of unstructured information. Information classification is an important management means. The traditional labor classification management not only labors consumption but also working inefficient. This article points out an automatic classification method that based on Text mining to enhance the efficiency for enterprisesâ€™information classification.Based on the study of a variety of text categorization, and then using Support Vector Machine (SVM) as the main algorithm for the information classification. Considering to information gather from network which would cause the imbalance of sample distribution. Supplemented with KNN Algorithm, due to SVM classifiers are working inefficient while near Hyperplane which means there were K samples instead of a sample to classify information to improve the overall classification results.Firstly, in view of the unstructured enterprise information, there is a pre-process for enterprise information. Enterprise information is under process such as word segmentation, removing stop words etc. And those results will statistics the term frequency, document frequency etc. At the same time, considering to the imbalance of enterprise information gathering from network. Information Gain is adopted as feature selection method. Dispersion and concentration parameters that stronger ability of category representation introduced to reduce the dimensionality of feature list. The eigenvectors of enterprise information are constructed with feature words that benefit information classification. The default penalty factor C and the kernel function parameter were conducted trial of four kinds of commonly used kernel function experiments that confirms RBF kernel function utilization. Through grid-search method and the five fold cross validation method to find out the optimal kernel parameters G. On the basis of this, the SVM information classifier produced after training. The SVM classifierâ€™s Support vectors working as a KNN classifierâ€™s training sample, while considering to the enterprise information obtained may cause the data imbalance problem. KNN classifier introducing a weighting factor for adjusting the weights between the categories, and experimentally determine the K value of KNN classifier. In combination with SVM and KNN classifier, the threshold Î¸ value is determined by experiments. SVM-KNN classification model adapting vector supported KNN classifier based on weight while information classification approaching to SVM classifier hyperplane. In contrast, SVM classifier results are to be obtained directly while in a long distance from hyperplane.In this article, information classification experiment conducted in a large-scale enterprise in a specific industry. It is verify that the efficiency of SVM-KNN classification model based on a large number of enterprises. It is better adapted to the imbalance of enterprise information sample, which makes the enterprise information classification more accurate.

Keywords/Search Tags:

enterprise information, text categorization, Support vector machine (SVM)

PDF Full Text Request

Related items

1	Study On Text Categorization Method Based On Support Vector Machine
2	The Research On Text Categorization Algorithm Based On Support Vector Machine
3	The Research And Implementation Of Chinese Text Categorization
4	Support Vector Machine Application In Text Categorization
5	Application For Web Text Categorization Based On Support Vector Machine
6	The Application Research Of Support Vector Machine Theory In Text Categorization
7	Research On Clustering And Text Categorization Based On Support Vector Machine
8	Research On Support Vector Machines Classification Algorithm In Text Categorization
9	A Study On Text Categorization Based On Machine Learning
10	Research On Chinese Text Categorization Based On Support Vector Machine