Font Size: a A A

Research On The Blog Community Detection And Its Theme Extraction Technology

Posted on:2014-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q LiuFull Text:PDF
GTID:2248330398465189Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
In recent years, the rapid development of the Internet has promoted the rise anddevelopment of e-commerce, and more and more enterprises earned high profits throughthe emerging model of e-commerce, which also profoundly changed the traditional way oflife of the people. However, how to develop effective online marketing programs in orderto improve the operation of e-commerce becomes to be a difficult problem of e-commercebusinesses. Blog, as the typical applications of Web2.0, is made to form a huge socialnetwork by the interact such as frequent links and comments replies. Blog is also amanifestation of people’s thinking and behavior in the network of virtual communities.Therefore, finding the blog community quickly and efficiently and extractiing the theme ofthe community accurately is benefit for companies developping a reasonable marketingprograms, realizing accurate network marketing in order to reach the goal of maximizingprofits, and thus has important practical value.Though many algorithms could detect blog communities based on link analysis now,they all have some problems. And the blog communities detected by them don’t havethemes, which could reflect the point of the blog communities. In order to solve theseproblems, what I have researched is mainly as below:(1) Determine to combine the ways of link analysis and content analysis to find theblog communities and its’ themes.(2) Make a brief introduction of the basic concepts of Blog and Blog Community, thedata model of the blog pages and links, as well as the mature algorithms to detect blogcommunities. Determine to use the trawling algorithm as the basis of my study based onthe comparision of the advatages and disadvantages of these algorithms.(3) Present “the algorithm of blog community detection based on FCA”. The algorithmuses the algebraic digestion of the concept lattice to divide and merger the community cores, which sovles the problems of the trawling algorithm such as excessive number ofblog communities, the high repetition rate between blog communities, as well as the poorcontent in blog communities resulted from the strict definition of blog community.(4) Present “the method of blog communities’ theme extraction based on LSA”. Themethod uses the TF-IDF to extract theme words firstly, and then to reduce dimensionalityof the content by LSA. And propose automatic interception of k value of the singularmatrics and the automatic segmentation of larger matrix to improve the accuracy andefficiency of the method. The experiment shows that it could be better to reduce the noiseof the navigation menu, the web structure of the blog itself and highlighting the theme ofthe blog community compared with the method of TF-IDF.(5) Design and implement a powerful prototype system of blog community detectionbased on the combination of the technology of asynchronous web crawler with blogcommunity detection algorithm and technology of its theme extraction.
Keywords/Search Tags:Community Detection, Theme Extraction, Precision Network Marketing, Formal Concept Analysis, Latent Semantic Analysis
PDF Full Text Request
Related items