Font Size: a A A

The Design And Implementation Of Emotional Classification About The Users' Comments

Posted on:2012-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:J JiangFull Text:PDF
GTID:2178330335463333Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years, with the emergence of microblogging service, social networking service and so on, the widespread network applications accelerate the progress that the users convert the information-receiver into the information-creater.Nowadays, the users are more willing to upload their comments on friends, businesses and productions.As a result, the number of comments are rapidly expanding, there is vast quantities of information in the internet everyday.Based on this kind of situation, it's more difficult to mine sentiment that commented by previous users from vast amounts of user comments by current users,At the same time,it's Impossible to analyze the sentiment by aritificial.Under this circumstances,a flood of comments has become a burden on Internet users.Since most of the comment information is used to describe the natural language, so we can use natural language processing to help users summarize and classify information about those comments, conclued comment emotional tendencies. Emotional classification of the users' comments have become a hot topic now, and it will become an important part of the internet.Firstly, the working principle of Web crawler is introduced, secondly, analysis of the current crawling strategy of Web crawler, and then the system's related technologies, which include open-source crawler Heritrix, web analytic tools HtmlParser, segmentation system ICTCLAS4J, script parsing engine Rhino. And their main functions and working principle are described in detail.This paper presents a prototype of sentiment classification system, and provides the key technology solutions. First, the overall framework of Heritrix is described, according to the actual needs of the project, secondly, customizes a particular extractor for special website, to achieve the objective of multi-threaded crawling, using a hash algorithm instead of the original URL allocation strategy, and then analyzes the Htmlparser for parsing the body of website, Rhino for Javascript parsing process, puts forward the solution for the analytical of the system's website. Finally, raises the sentiment classification algorithms, based on the previous researches, The result of the sentiment classification is based on the the value of the phrase patterns. At last, this paper gives a detail description about the implementation process, and supplies the relevant code, realizing the platform which assembles the acquisition, processing and classified.This system is divided into pages collection, page analysis, sentiment classification, In which Web page collection module is supplying the original data for the next two modules. And Web analysis module is acquiring data from the collection model as the original text of sentiment classification module. And Sentiment classification can be achieved by segmentation, tagging, efficient phrase extraction this paper is testing on Dianping.Experiment results show that the system has higher precision and recall.
Keywords/Search Tags:Web Crawler, Html Parser, Sentiment Analysis, Phrase Pattern
PDF Full Text Request
Related items