Font Size: a A A

The Design And Implementation Of Blog Search Engine Based On Comments

Posted on:2017-09-13Degree:MasterType:Thesis
Country:ChinaCandidate:H LiFull Text:PDF
GTID:2348330509954206Subject:Engineering
Abstract/Summary:PDF Full Text Request
Blog also known as web logs, is a way to release the personal information. From a personal point of view, blog is a way of expressing ideas and sharing valuable resources with others. With the proliferation of blog users, more and more people use blog to communicate with each other and share knowledge. Today,blog occupies an important position as the way of people access to the important resources.With the development of Internet, various applications allow users to comment.The comments of users directly reflect the bias of user's emotional. Blog application neither exception, when we read a good blog, we'll give a high evaluation. So we can think a blog that contain a lot of favourable comments is a good one. In this paper, we optimize the blog search results through the analysis of blog comments, so that when users in the search of related blogs, search results are sorted according to the quality of the blog on the basis of the correlation degree. The main works are as follows:First, the system need to grab blog comments accurately. Due to some of the comments in the blog site are dynamically generated by the Ajax, traditional web crawler can not complete the grasping job of dynamic pages, so we will use phantom JS to improve the traditional crawler. In order to extract the contents of the comments from the captured web pages, this paper uses a maximum DOM tree algorithm to solve this problem. This algorithm can accurately extract the comments of the web page.Second, after the completion of grasping the comments is to analyze the text orientation.The purpose of the analysis is to generate an overall review score,which will be used as a guide for the sort of the search blog results. In this paper, the text categorization method is used to analyze the emotional tendency of the text and improve the accuracy of text classification by constructing the sentiment dictionary and the improved feature extraction algorithm.Finally,this paper designs and implements a blog search engine system(CBlog) based on the Nutch open source software. Nutch itself only considers the key words and link analysis factor to calculate the score of the document,CBlog system increases the emotion analysis factor. By returning the high quality blogs, CBlog improve the user's search experience.
Keywords/Search Tags:Blog comments, Emotional orientation, Nutch, Search engine
PDF Full Text Request
Related items