Font Size: a A A

A non-heuristic machine learning approach for classifying Twitter content

Posted on:2015-07-17Degree:M.SType:Thesis
University:Oklahoma State UniversityCandidate:Karuparthy, SrikanthFull Text:PDF
GTID:2478390017495544Subject:Computer Science
Abstract/Summary:
In online social networks like Twitter, the users usually get inundated with the continuous stream of short messages or tweets. This problem can be handled using classification. Classification is a supervised data mining technique which involves assigning a label to a set of unlabeled objects. A conventional approach for classifying text or tweets is to extract features from the linguistic content posted by the users. A recurrent problem in classification is feature selection, that is, to decide the best set of features for making a particular classification decision among the infinite possible different sets of features. This process usually involves heuristic approaches that require manual feature selection by experts, which involves guesswork, prior information about the dataset and a great deal of tweaking and experimental validation. To address this problem we propose and employ a non-heuristic machine learning approach which will automatically decide the feature set for a classification task. Our analysis shows that our automated feature selection process for Twitter content classification performs on par with current state-of-the-art approaches which incorporate painstaking, time-consuming human effort to manually and heuristically select a feature set. This approach will improve the timeliness and accessibility of data mining social media data streams.
Keywords/Search Tags:Approach, Twitter, Feature
Related items