Font Size: a A A

Application And Research Of Text Classification In E-government Platform

Posted on:2014-06-28Degree:MasterType:Thesis
Country:ChinaCandidate:J Q XiangFull Text:PDF
GTID:2268330401971805Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the rapid developing of web and information technologies, how to do information extraction and retrieval from such a big ocean of data. To categorize texts can process and organize data effectively. Convenient for users to quickly and accurately locate the requirements. This paper makes analysis for text classification and its technologies, and does a research for how to better apply them to decision support. There are mainly3aspects as followed:First, This paper firstly explains the process of text classification and its related technologies, including text pretreatment, Chinese words segmentation, vector space model, feature selection and its weighting computation, etc.Second, text classification modal based on information gain,In traditional TF-IDF algorithm, we only consider word frequency and inverse document frequency without taking the influence from feature distribution to text classification into account. This paper, on the basis of the classical TF-IDF, combining the distribution of feature terms among categories, based on the improve of the IDF algorithm,then use the inadequate distribution of information gain and class information entropy factor correction TF-IDF algorithm, to improve the accuracy of the text representation. The classification algorithm based on information gain.The main idea of this is:If the characteristic words appear in the text,then calculate the contribution of information gain value, in order to make sure the text’s category, which is just the category with biggest contribution.Third, the application research of the insertable text classification system in E-government Platform.The insertable text classification system based on the text classification, which can isolate the classification system from the application system automatically and semi-automatically, as long as deploying the insertable pattern to the application system, features with high reusability, convenience and simple operation.
Keywords/Search Tags:Text classification, TF-IEDF algorithm, Information gain, Feature distribution
PDF Full Text Request
Related items