Font Size: a A A

Clustering of Facebook Post

Posted on:2015-09-27Degree:M.SType:Thesis
University:University of California, DavisCandidate:Lin, Chang-YungFull Text:PDF
GTID:2478390017997547Subject:Computer Science
Abstract/Summary:
The purpose of the thesis is to extend current document clustering techniques on posts from Facebook public groups. Ultimately, the proposed method can distinguish various points of view between clusters through linguistic analysis. Posts, queried by keyword, are submitted to text analysis software. Word count analysis is done using Linguistic Inquiry and Word Count (LIWC). LIWC dictionaries provide and analyze a representative sample of posts' word usage which can be partitioned by utilizing the standard K-Means partitional clustering and Hierarchy-Ward agglomer- ative hierarchy clustering algorithms. Currently, clustering performance evaluations require ground truth labels to evaluate performance which is not provided by the Facebook data set, and therefore, a new evaluation approach is needed to validate our method. Hence, this thesis provides a novel approach called significant different ratio of cluster average (SDRCA) to meet these needs. To show SDRCA is acceptable as a validation technique, SDRCA is presented to meet the trend of other clustering performance evaluations through simulations on a known news group data set, thereby indicating SDRCA is acceptable. Returning to the problem of classification, we use SDRCA to evaluate our applications of LIWC. Our findings indicate that, first, LIWC with Hierarchy-Ward clustering provides better results than LIWC with K-Means. Second, our method is helpful in distinguishing posts through linguistic features.
Keywords/Search Tags:Clustering, LIWC, Facebook, Posts, SDRCA
Related items