Font Size: a A A

The Research Of Micro-blog Information Classification Method Based On Mixed Characteristics

Posted on:2014-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:X GaoFull Text:PDF
GTID:2268330392473742Subject:Mathematics
Abstract/Summary:PDF Full Text Request
In recent years, the rapid development of the micro-blog makes it has greatpower on the network. Micro-blog information classification can help the users to getthe information they want quickly and accurately. It can also filter the junkinformation of micro-blog what is useless. there is very important significance.According to the characteristics of micro-blog, this paper proposes the followingmethods to improve the accuracy of micro-blog information classification. Firstly thispaper presents some characteristics of micro-blog besides the text part as thecharacteristics of micro-blog. We put these characteristics and the text part of themicro-blog together. We called it the mixed characteristics. They are the3F method(the author information+text+link) and the4F method (the author information+text+link+review). This work is based on the8F(8features)method, what is putforward by the former students in order to solve the tweet message classificationsproblems. Then we introduce frequentness and other factors based on the traditionalChi-square (CHI) statistic, improved the traditional feature selection method, it isbased on the hypothesis that features appears many times in a kind of the micro-blogshould be highly correlated with the classification. We also proposes a method of thetf*idf*improved Chi-square statistics, it improved the tradition weight calculationmethod in micro-blog classification. Finally, we propose the “1F-3F” method, it is amethod combined with the text part and the mixed characteristics.We tested the methods we proposed above using classical KNN or SVMalgorithm. The experimental results prove that the method we proposed is feasible inthe micro-blog classification.
Keywords/Search Tags:Micro-blog information classification, mixed feature, feature selection, weight calculation, CHI-square statistic
PDF Full Text Request
Related items