Font Size: a A A

Research On Opinion Mining Of Cross-Domain Chinese Comments

Posted on:2013-07-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:1318330482950219Subject:Information Science
Abstract/Summary:PDF Full Text Request
With the development of computer and network technology in recent years and the increasing need of implicit meaning of a diversity of data,opinion mining draws more and more attention and becomes a hot research area.The outcome of opinion mining can be used widely in many internet domains,playing an important role in politics and economy.These domains include harmful information filtering,society opinion analysis,user comsumption guidance,product improvement and interest analysis and so on.The huge application value makes research on this area significantly meaningful.However,current domestic and international research on opinion mining area are usually single-domained,especially in product areas.Opinion mining in Chinese also usually restricts itself in product areas and few work has been carried out in opinion expressions.Regarding those non-product areas such as news,travel and medical,research reports in these areas are seldom seen as well.As a result,opinions hiding in these kinds of domain data are never revealed and the situation futher limits the ability to understand information and make right decision,no matter by oridinary users or goverment agencies.Therefore,research on cross-domain opinion mining,particularly,opinion expressions and comment-target/opinion-expressions pair extraction become more and more essential.In this paper,we'll discuss the topic "opinion mining of cross-domain Chinese comments",explore major aspects of opinion mining using technologies and method from natural language processing based on the research results of product areas,extract characteristics hiding in cross-domain comments.The paper will contain four parts:extraction of comment target,extration of opinion expressions,comment-target/opinion-expressions pair,semantic orientation of comment target.Problems of each aspects are examined and studied in depth along with the cross-domain text characteristics.This paper identifies the major problem for each aspects:(1)improve the accuracy of comment target extraction(2)improve the accuracy of opinion expressions extraction(3)improve the match rate of comment-target/opinion-expressions pair(4)improve the classification accuracy of comment target's semantic orientation.This paper then proceeds with major achievements in the follwing aspects:(1)Extraction of comment target.Base on the fact that each of two mainstream methods-rule matching and machine learning-have their own advantages but none of them produce ideal F value,we present a new method combining advatanges from both and even capable of processing cross-domain Chinese comment.We extract core sentences based on rules from original comment text and feed core sentences to CRPs model.Meanwhile,we add several syntactic pattern to the original characteristic set which is usually built by words and the part-of-speech of words.Introduction of syntactic pattern into CRFs training characteristic set achives call rate improvement while still makes a good use of high accuracy in machine learning.(2)Extraction of opinion expressions.We engage CRFs' perfect annotation capability to extract opinion expressions in Chinese cross-domain comment.Characteristics including emotional dictionary and comment target are also applied to improve the extraction accuracy.(3)Extraction of comment-target/opinion-expressions pair.We further evaluated a more universal and general match pattern on top of original nearest rule and syntactic relationship.As a result we propose an enhanced nearest match algorithm.A better accuracy is also demostrated because extraction result of comment-target/opinion expressions can futher calibrate comment target.(4)Semantic orientation evaluation of comment target.We use two methods to determine the semantic orientation of cross-domain comment target,one of them is the combination use of emotional dictionary,synonymous dictionary and negative words dictionary while the other is machine learning model SVM.The major innovations of this paper are listed as the followings:(1)Core sentence extraction is proposed and used in conjunction with CRFs to identify comment target.Meanwhile syntactic patterns constructed from syntactic relation are added in the characteristic set.(2)Annotation in Chinese opinion expressions is introduced on top of comment-target/opinion-expressions pair.(3)An enhanced nearest match algorithm is proposed for extracting comment-target/opinion-expressions pair.This algorithm further improves the accuracy of identifying comment target.(4)Experiment results is compared for semantic orientation evaluation using two methods-dictionary-based method and machine learning model SVM.We also illustrate how cross-domain text affect the algorithm selection.
Keywords/Search Tags:opinion mining, cross-domain, comment target, opinion expressions, comment-target/opinion-expressions pair, semantic orientations, Conditional Random Fields(CRFs), core sentences
PDF Full Text Request
Related items