Font Size: a A A

Research On Short Text Classification Based On Its Own Features

Posted on:2017-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:C Q YangFull Text:PDF
GTID:2308330485462194Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The short text such as online reviews, Internet chat, search segment and micro-blog contains a lot of valuable potential information, however, the short text has the characteristics of sparseness and weak description of concept, which makes the traditional text classification technology face a great challenge. This dissertation has important theoretical and applied value in the research of short text classification method based on the sparseness and weak description of concept.The main work of this dissertation is as follows:(1) A detailed analysis of the relevant research field of short text, and some key technologies of text classification were briefly introduced.(2) Most of the existing feature selection methods on the correlation between features and class are not sufficient to consider the correlation between features and classes, the short text classification method based on category distinguishing feature is proposed. The weight of feature is based on feature distribution relationship between class and within class, and it selects the important features under the local environment with the iterative method, and does the local classification with these features. The experimental results show that the method in accuracy and time performance has a great advantage.(3) To solve the imbalance and the sparseness problem of short text, the short text classification based on extension with its own Features is proposed. First the method selects the high indicative features with same ratio for each category and merges all non-redundant features; Secondly, the method composes feature space according to the features and the training set and test set will be vectored, and then classify the short with vectors. The experimental results show that the proposed method can effectively improve the effect of short text classification.
Keywords/Search Tags:category distinguishing feature, weak concept, sparsity, short text classification, unbalance
PDF Full Text Request
Related items