Font Size: a A A

Study And Implementation Of Microblog Oriented Text Quality Evaluation And Classification

Posted on:2016-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:M ChenFull Text:PDF
GTID:2348330536467732Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Microblog is producing a tremendous amount of new content with complex composition all the time,For users who need information,only a very small part of content meets their needs.Finding high-quality content from the ocean of information about microblog is essential.Popularized sentences that express emotions are usually subjective,subjective and objective classification is the basis of researches such as text sentiment analysis and opinion holders extraction.As a pre-treatment,research on objective and subjective classification has very important significance.In this paper,techniques of microblog oriented text quality evaluation and classification have been studied.It mainly includes:About text quality,firstly non-English text,stop words and repeated text of twitter data were filtered;Then,this paper uses content of URL as the information extension.Then225 Twitter topics were selected to simulate users of recommendation system,similarity between each tweet and each topic is calculated.Next,importance of each tweet is calculated by a classifier trained by text features and user features.Finally,quality evaluation of each tweet is combined by the similarity and importance.About subjective and objective classification,this paper improves 2-POS mode and2-both-POS mode is presented.The POS combination of each text sentence is used as its part-of-speech feature,then classification experiments were conducted.In order to select the most appropriate statistic by which the part-of-speech features are sorted,this paper studies a metric and uses it to compare two features.Next,an adaptive threshold is calculated by dynamic threshold algorithm using the previous result.Finally,by implementation of the system,the availability of the system is validated.
Keywords/Search Tags:Microblog, Twitter, Text Quality, Subjective and Objective Classification
PDF Full Text Request
Related items