Font Size: a A A

Research On Acquisition Of Entity And Emotion Expression

Posted on:2019-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:Q LuFull Text:PDF
GTID:2428330545951208Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The acquisition of entity emotion dictionary is an important research topic in Natural Language Processing.Emotion word is a useful semantic unit in the text.There are two steps in the acquisition of the dictionary: 1)named entity recognition(NER);2)obtaining the emotion words corresponding to the entity.Building entity recognition system usually requires a large number of human annotated dataset.However,most of the existing annotated corpus only contain persons,locations and organizations in news domain.The performance of the system decreases greatly for the text in a new domain and it can not identify new entity type.To handle this problem,we define Chinese entity annotation specification for multidomains and hire persons to annotate the data.Finally,we obtain a new annotated NER data for multi-domain.In order to improve the performance of the entity recognition system in the new domain text,this paper studies the method of domain adaptation NER.At the end of this paper,an effective method to obtain entity emotion dictionary from large scale text is proposed.This thesis contains the following contents:(1)Define a Chinese named entity specification and build a multi-domain NER data.We choose three domains including human-computer interaction,social media,and ecommerce to collect the raw data,and annotate several predefined entity types: person names,organization names,location names,administrative music,brands,products,raw materials and other types.We compare the performance of the common sequence labelling models on the newly created datasets.In addition,for different entity types,we analyze the differences among entities to better capture the characteristics of each entity type.(2)Propose the cross-domain entity recognition method.According to the characteristics of different domains,we propose a method of cross-domain entity recognition to improve the performance of domain adaptation.We learn common features between domains through adversarial Learning algorithm,and learn private features on large-scale raw data in domain with the help of language models.Finally,we combine these two features together to promote the performance of NER in-domain.(3)Propose a method of automatic acquisition of large-scale entity emotion expression.We analyze the shortcomings of the existing sentiment lexicon and propose the new expression of entity words + emotion words.We focus on the method of automatic acquisition of entity emotion expression.First,all possible entities and emotion words are obtained by using entity recognition system and part-of-speech rules,and their full permutations are combined to form our candidate set.Then,the candidate set is transformed into a bipartite graph structure,which is sorted by the pagerank style algorithm.Finally,we utilize a refinement algorithm based on semantic similarity to further refine the ranking results.The entire process is fully automated by setting an appropriate threshold.Experimental results show that our method can effectively mine entity emotion expression.Additionally,we release the entity emotion expression database on Github.
Keywords/Search Tags:Human Annotation Corpus, Named Entity Recognition, Domain Adaptation, Entity Emotion Expression
PDF Full Text Request
Related items