Font Size: a A A

Research Of Blog Tag Verification And Supplement Method

Posted on:2011-06-24Degree:MasterType:Thesis
Country:ChinaCandidate:J L CaiFull Text:PDF
GTID:2248330395958276Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Blog is a typical application of Web2.0. With the rapid development of blog, the number of blog post also increases exponentially, blog search engine can solve the problem of finding information which people are interest of in vast amounts blog posts. In order to facilitate users browsing search results, some blog search engine has integrated clustering techniques. Blog tag is added by bloger for the purpose of denoting the subject information of blog post, so using it blog tag for blog post clustering is ought to be a good choice. But due to the bloger’s personal reasons, the tag may be inconsistent with the subject information of blog post or it can not express the subject information completely, so the blog tag can not be used to blog post clustering directly.In the light of the background above, this thesis proposes a method of blog tag verification and supplement by using the subject information of blog post. This thesis takes Sina blog post as the object of study. This thesis provides the process and framework of the blog tag verification and supplement method, based on analyzing the attributes of Sina blog post and establishing the model of it, then elaborates the key algorithms in the process of blog tag verification and supplement, including algorithm of feature selection with ontology; algorithm of BP network classification combined with similarity and algorithm of blog tag verification and supplement, and the method of creating ontology according to Wikipedia category. The algorithm of feature selection with ontology is aiming at selecting features better to express blog posts which are used to train BP network. It uses DF method to reduce the dimension first, then uses CHI to compute the value of term and adjusts the value depending on whether the term is contained in the ontology. The algorithm of BP network classification combined with similarity takes the advantage of category information of blog post which is supplied by blog tag, only if the category information is not accurate or there is not category information, the BP network is used to classify the blog post, in order to reduce the time consuming. Algorithm of blog tag verification and supplement is based on the ontology which belongs to the same category with blog post. First get the nodes in the ontology of blog post and blog tag separately and compute the similarity between them, then verify blog tag by the value of similarity. Blog tag supplement is accomplished by adding nodes to blog tag which are belongs to blog post nodes but not included in tag nodes or their children nodes, and have certain difference degree with these nodes.
Keywords/Search Tags:blog search, ontology, BP network, blog tag verification, blog tag supplement
PDF Full Text Request
Related items