Font Size: a A A

Research On Chinese Short Text Classification Algorithm Based On CRFs

Posted on:2014-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:J N CengFull Text:PDF
GTID:2248330392461036Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid growth of the Internet, online life has became anessential part in people’s daily life. The information people used inInternet is mainly in form of short text. Short text is a kind of text whichis very short(generally no longer than140characters). Usual newsheadlines, micro-blog, short messages, email, shopping comment in ourlife is short text. To face of the information explosion in Internet, Usingautomatic classification technology will help people find and spend moretime on what they are interested. This paper focus on short textclassification techniques based on conditional random fields(CRFs),presents optimization of feature selection and text representation.Compared to usual text, short text has some distinct features. Thispaper described the feature of short text and analyses related research inthe field short text classification. Most of the short text classification isthe improvement of the traditional text classification methods. This paperreviews the general process of text classification, lists basic area of textclassification, such as Chinese word segmentation, feature selection andfeature weight calculation, text representation, text classifier. Conditional Random Fields is a undirected graph model ofprobability model, which will calculated the output sequence if given theinput sequence, and it is the improvement of traditional directed graphmodel. Using probability model is the new idea in text classification field.This paper systematically describes the theory of CRFs, and how thelabeling sequence method can be applied for short text classification.Facing the sparse of feature in short text, this paper presents a method topredict the label sequence using CRFs for short text classification. Andit highlights on the label sequence method, classification calculation andthe choice of feature templates.Comparative experiments show that the short text classificationbased on CRFs is an effective classification method in News subjectclassification, micro-blog subjective and objective classification,micro-blog emotional tendencies classification.
Keywords/Search Tags:short text, text classification, CRFs, SVM, micro-blog
PDF Full Text Request
Related items