A non-heuristic machine learning approach for classifying Twitter content

Posted on:2015-07-17

Degree:M.S

Type:Thesis

University:Oklahoma State University

Candidate:Karuparthy, Srikanth

Full Text:PDF

GTID:2478390017495544

Subject:Computer Science

Abstract/Summary:

In online social networks like Twitter, the users usually get inundated with the continuous stream of short messages or tweets. This problem can be handled using classification. Classification is a supervised data mining technique which involves assigning a label to a set of unlabeled objects. A conventional approach for classifying text or tweets is to extract features from the linguistic content posted by the users. A recurrent problem in classification is feature selection, that is, to decide the best set of features for making a particular classification decision among the infinite possible different sets of features. This process usually involves heuristic approaches that require manual feature selection by experts, which involves guesswork, prior information about the dataset and a great deal of tweaking and experimental validation. To address this problem we propose and employ a non-heuristic machine learning approach which will automatically decide the feature set for a classification task. Our analysis shows that our automated feature selection process for Twitter content classification performs on par with current state-of-the-art approaches which incorporate painstaking, time-consuming human effort to manually and heuristically select a feature set. This approach will improve the timeliness and accessibility of data mining social media data streams.

Keywords/Search Tags:

Approach, Twitter, Feature

Related items

1	Study On The Broadcasting Featuers Of The Twitter.com
2	Are There Perks to Being a Twitter Wallflower? Peripheral Participants in a Twitter-Enabled Learning Space in Public Relations and Higher Educatio
3	Research Of Twitter Retrieval Based On Semantic Similarity Computing And Twitter Storm Platform
4	Education all a'Twitter: Twitter's role in educational technology
5	Research On Organization Name Disambiguation On Twitter Data
6	Is Twitter a counter public?: Comparing individual and community forces that shaped local Twitter and newspaper coverage of the BP oil spill
7	Discovering Twitter Users' Off-line Community
8	The Twitter Management Of American Local Newspapers And The Revelation Of Chinese
9	Research On Specific Event Detection In Twitter
10	The Model Of The Development And Future Of Twitter In China