Font Size: a A A

The Study On Domain-based Opinion Target Extraction

Posted on:2016-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:S YangFull Text:PDF
GTID:2308330476454959Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the further development of the Internet in China, more and more people acquire information through the Internet. In the era of Web2.0, the user is not only the consumer of network, but also the producer of the Internet. A large number of users’ evaluation information is produced on the Internet, it can not only provide references of goods for consumers, but also offer feedback information of products to the producers and so as to make them understand the shortcomings of their products and perform some improvements. However, it is not realistic in the big data era with so much information and how to use the technology of natural language processing to perform opinion targets extraction is a hotspot of current researches.The task of pinion targets extraction is extracting the objects which people make comments in the opinion sentence. And the methods of opinion targets extraction is mainly divided into three categories: supervised, semi-supervised and unsupervised. Although the supervised method can often obtain higher precision score, its requirements for manual annotation can not be overlooked. Chinese orientation analysis evaluation is held from 2008, and task on opinion targets extraction is contained every year. Considering the difficulties to obtain high precision and recall on domain-ignored opinion targets extraction, this paper made practical researches on domain-specific opinion targets extraction and used domain knowledge to reach a higher precision, recall and F-measure score.This paper mainly studied the opinion targets extraction in Chinese automotive field. This paper mainly focus on the automotive-field opinion targets extraction and the main contributes are as follows:1) The automobile-field knowledge base is constructed through grasping semistructured data from the website of http://www.autohome.com.cn and stored in form of multiple-groups. And then the automotive-specific word vector is produced through training on a large amount of automotive-field datasets.2) In this paper, CRF-based method is adopted and the task of opinion targets extraction is transformed to a sequence labeling problem, in which words, part-ofspeech, syntax and the constructed automobile-field knowledge base are selected as features of CRF to improve the extraction results. Then on the basis of prediction with CRF, some designed strategies and methods with word embedding are adopted for opinion targets expansion to reach a higher recall score in the experiments.3) This paper also builds an automobile-field opinion targets extracting and verifying platform. Nginx web server, Python Django framework and socket communicate form are adopted in this platform and the whole system are established on the Linux platform. The algorithm performs communicate with the platform through the socket interface and so as to reduce the coupling. Consequently, the Internet web solutions provided a application for our studied algorithm of opinion targets extraction.
Keywords/Search Tags:opinion targets, automobile-field, knowledge base, conditional random field, word embedding
PDF Full Text Request
Related items