Font Size: a A A

User Generated Content Quality Evaluation Based On Text Analysis

Posted on:2018-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:X Y CongFull Text:PDF
GTID:2348330518993370Subject:Intelligent Science and Technology
Abstract/Summary:PDF Full Text Request
With the fast development of Social Networking Services, more and more users who like showing themselves will generate countless User Generated Content (UGC) every minute. UGC implies a high value in various aspects such as economy, politics, culture and so on. Given this information explosion and quality unevenness, how to analyze the quality of UGC automatically becomes a challenging task for researchers.To solve the problem, predecessors of the lab have fully considered the influence of user authority, propagation velocity etc. But we believe that the textual content of UGC is the key factor for its quality. Hence, we focus on textual content based quality evaluation and classification instead of using UGC publishing related data, such as times being commented and forwarded in this paper. We extract various features of the textual contents based on natural language processing technologies firstly, such as word segmentation, keywords, topic model, sentence parsing, distributed word representation etc. Secondly, we build several base-learning classifiers with different features and different machine learning algorithms to assign UGC contents with four different quality labels. Experiments show that semantic feature is better than grammatical features to reflect the quality of the UGC, and different classifiers have different characteristics according to the corpus with different quality. In order to learn from the advantages of different features and classifiers, we create the global meta-learning model based on these base classifiers. We have also implemented a series of experiments based on realistic data collected from Tianya Forum and use 10-fold cross-validation to test the model. Results have shown that our proposed meta-learning model performs much better.
Keywords/Search Tags:user generated content, text analysis, metalearning, content quality evaluation
PDF Full Text Request
Related items