Font Size: a A A

Blog Property Mining Model Design Based On Topic-Relevant Blogs

Posted on:2012-12-19Degree:MasterType:Thesis
Country:ChinaCandidate:C L TanFull Text:PDF
GTID:2178330335460486Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Topic-relevant blog is the blog which publishes articles around a specific topic in the blogospher. With the popularity of blog, the blogospher is filled with varied content, people is not limited to topic-relevant blogs, but would like to search blogs which meet certain specific properties, for example the blog with an emotional bias, or with a high confidence of the author, or with a certain style or a genre of writing and so on. In a word, people want to retrieve the blogs which are high-quality and basically interested in a certain subject. Based on this demand, this paper focuses on the topic-relevant blog distillation algorithms and blog property mining algorithms, the main innovation and results are as follows:1) Design and implement a set of distillation experiments based on the article similarity mean, Combined with the blog post's recall precision distribution curve, without wasting the system resources, we calculate the minimum number of recall documents with the highest precision, which greatly improves the computational efficiency and accuracy. Meanwhile, the baseline experiments perform very well and get 1st place in TREC 2009 Faceted Blog Distillation Task.2) Design and implement three models of mining blog properties, in which, one uses the maximum entropy model based classification to distinguish the opinionated and factual properties, one second Stanford named entity tool to identify the named entities in the blog, and then makes use of the occurrence of the entity to predict the personal and official properties, and the last uses the length of blog post, the average length, and features characterized by term frequency and query term frequency to build a L-Qtf coefficient to predict the indepth and shallow properties. These models in the blog evaluation have made outstanding achievements.3) In the personal and official model, we regard it as a two-classification problem and do a comparative experiment, which use the high stability and robustness SVM classifier for classification. The result shows that we can not take it as a simple classification problem because of the unbalanced distribution of the blog nature. In addition, four comparative experiments are designed to identify that the length of the blog post and the relevant features are equally important in the indepth and shallow model.
Keywords/Search Tags:blog distillation, property mining, topics expansion, SVM classifier
PDF Full Text Request
Related items