Font Size: a A A

Economic Sector Classification Method Based On Machine Learning

Posted on:2012-06-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y WangFull Text:PDF
GTID:2218330362451680Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the prosperity of our market economy, national economy is becoming mature in virous fields and cross-industrious enterprises are constantly emerging. However, owing to the artificial division of industries , solely classifying one business institution into a single industry, possible cross-industrious is not considered.It is urgent to find out how to efficiently count the development of economic industries and the three industrie as well as respect cross-industry phenomenon scientifically and accurately.The scale of business institution describes both operating activities and social activities in which enterprises , public institutions, governmental organisations and private institutions participate . A certain business institution can be classified into the correspondending industry based on a certain description of the scale of it .This paper builds an efficiently atomatic classifying system of economic industries and the third industry and porfoundly studies the phenomenon of the combination between the profession and the industry, with the help of the decriptions of business scale and text classification technics based on machine learning. The details are as follows:Study the methods of obtaining high-quality datas which satisfy the training requirements under the situation of the inaccuracy of class label on training datagroups and the serious training noise with the utilisation of Support Vector Machines and information searching technics.Explore the text classifying problems in certain fields and compare the distinctions between chi-square test and Word Frequencies selecting and optimizing.With the fact that the TFIDF can not meet the requirements of multi-classification, we proposed a variant of TFIDF; In the condition of multi-classification, this method can substantially increase the discrimination between the text, thus greatly improve the classification capability.When text classification method based on machine learning can not meet the requirements of multi-classification, we rerank the result of classifier use the other information that can discriminate the text to improve the system performance of our classifier.In this paper, we designed a multiple large-scale text categorization system for the sector(95) and industries(3). In the condition of multi-class and noise, the result shows that the accuracy of industrial Classification (top5) is 91.34%, the accuracy of industry classification (top1) is 94.26%, can meet the practical requirements of large-scale text category.
Keywords/Search Tags:text classification, economic sector, multiple categories, chi-square test, rerank
PDF Full Text Request
Related items