Font Size: a A A

Research On Automatic Classification Methods Of Enterprise Business Scope

Posted on:2011-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:M M FanFull Text:PDF
GTID:2178330338979979Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
This paper does the research on the automatic classification methods of enterprise business scope, which is an important part of the platform of Chinese-English of National Administration for Code Allocation to Organizations.With the advent of information age, information manifests an explosive growth throughout Internet. So the automatic classifying by computer has become a key technology for solving these problems. The technology of Chinese words segmentation is the most important part in Chinese information processing. The algorithm of Chinese words segmentation is related to the performance of Chinese words segmentation system directly.Enterprise business scopes describe the businesses by which we classify economic industries for enterprises. In order to complete the research on automatic classification methods of enterprise business scope, we had a more comprehensive analysis of enterprise business scope. First of all, from the view of the structure and composition of enterprise business scopes, the length of it is usually very short (at least a few words, up to dozens of characters), and although its grammatical structures is not complicated and the range of the grammatical structure types are narrow, it contains a large number of term entities which can't be well dealt with general segmentation algorithms, existing segmentation algorithm is mainly used in dealing with Lengthy document, such as news reports, scientific literature, etc., and the segmentation results are satisfactory; for a shorter document or short documents of particular area, such as short Email, chat records, business scope, etc., the segmentation results are quite unsatisfactory. Secondly, from the view of data quality, the data set of business scope contains a lot of noise data, that's because the size of data set is huge (about 2 million) and the corresponding from business scope to the economic industries is done by the staff of all sub-classifications, which makes a certain degree of inconsistency and category error phenomenon due to individual discrepancy. Lastly, huge classifications, which contain 95 economic industries in total, make the difficulty in classifying. To solve the above problems, realize an efficient system of automatic classification methods of enterprise business scope, the following algorithms have been proposed: self-learning segmentation algorithm, to automatically segment the business scope and extract the feature items; the data correction method based on SVM, to eliminate the noise data contained in data set of business scope; the boosting algorithm based on Naive Bayesian to classify the enterprise business scopes.In summary, this paper proposes a method to achieve a practical and efficient automatic classification system of enterprise business scope. Then it shows the whole framework of the classification system and the framework of each module. The experimental results and analysis of the classification system is given at the end of this paper. The experimental results have been approved by National Administration for Code Allocation to Organizations, and the automatic classification system could be used in practical.
Keywords/Search Tags:business scope, self-learning words segmentation, short-text classification, data correction, Naive Bayesian
PDF Full Text Request
Related items