Font Size: a A A

Association Relationship Extension To Chinese Short-text Classification

Posted on:2013-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y S CaoFull Text:PDF
GTID:2248330371467666Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Currently,SMS,microblogging,instant message and internet relay chat have been growing rapidly.How to classify these short text to meet the information processing has become an important application requirement.However, although the existing text classification techniques have been extensively studied, the main studied object is for the text with a certain length.Coping the long-text classification techniques to the short-text classification is feasible? If not, we need to study the short-text classification techniques.This paper’s result shows that because of the too few features in short text, copying the long-text classification techniques to short-text classification is not feasible. The original classification techniques can not maintain good performance when they are applied to short text.This paper studies the short-text extension technology,which aims to make the expanded short text with more features,thus avoiding the above problems.The main work includes,1. Firstly,applying the technology of association rule to obtain a association rule collection based on words’ co-occurrence relationship.On this basis, using these words’ distribution situation in categories to select these association rules, which is to gain a high-quality association rule collection.2. When using the high-quality association rules to extend a short text, not only considering words’ relationship,but also considering the relationship between a word and the entire testing document.The results showed that the classification performance was a significant improvement for the expanded short text based on the aboved method.
Keywords/Search Tags:short text, classification, association rules, feature extension, interdependency
PDF Full Text Request
Related items