Font Size: a A A

Research And Realization Of Chinese Short Text Classification Approaches Applied On Mobile Phone Forensics

Posted on:2013-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:C ChuFull Text:PDF
GTID:2248330374457070Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Chinese short text classification is becoming a research hotspot as thedevelopment of mobile internet and popular of smart mobile phone. In area ofcomputer and device forensics, how to extract useful information quickly andaccurately from amount of short texts, which gotten from computers or mobiledevices, is becoming a problem for forensics. And short text classification isan effective way to solve this problem. In other areas, like social network,knowledge answer system and information retrieve, Chinese short textclassification also has wide application prospect.This paper focuses on solving the Chinese short text classificationproblem in computer and mobile device forensics, and it compares thedifferent classification approaches. Firstly, this paper talks about the keytechniques using in text classification, and points out that currently there aretwo types of classification approaches using in short text classification: one isimproved techniques based on long text classification approach, including different term-weighting based approaches, the other one is feature extendingapproach which rely on external library to increase features for short text.Then, this paper mainly introduces the six term-weighting based short textclassification approaches,wikipedia based feature extension approach,andthree improved term-weighting Chinese short text classification approaches.Finally, this paper designs experiment to compare the classification results,and specifically introduces the realization of each modules which includingterm separating module, feature selection module, term weighting module andclassification module, and then analyzes the experiment results.Experiment results show that during the nine term-weighting basedapproaches iqf*qf*icf based short text classification approach performsbetter than other eight ones; and although based short textclassification approach performs nearly the same with feature extensionapproach in the whole classification effect, it is more stable on both corpora;according to the classification results, SVM based short text classificationapproach performs a litter better than NaiveBayes based approach.
Keywords/Search Tags:Chinese short text classification, term weighting, featureextension, SVM, Naivebayes
PDF Full Text Request
Related items