Font Size: a A A

Simile Recognition Based On External Part-of-Speech Information And Long-distance Dependencies

Posted on:2021-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:P F ZhangFull Text:PDF
GTID:2428330611965652Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Simile recognition is to find simile sentences and extract simile components(e.g.,tenors and vehicles)from these sentences,which is an important research task of text mining.A simile is a metaphorical method that directly compares tenors and vehicles using connecting words such as “like” or “as”.The common form is “A like B”,where “A” is the tenor and “B” is the vehicle.It is worth noting that the tenor and vehicle in a simile sentence are usually two different things.In the simile recognition task,previous works have shown that simile components are typically noun phrases.In other words,the part-of-speech(POS)information is very important for the identification of simile components.However,the mainstream models only use a same static word embedding to represent a polysemic word,which is not enough to accurately distinguish the POS information of the word in different contexts.As a result,these models cannot assign a precise simile tag for the word.In this paper,the polysemic words are words that have more than two meanings,which usually have different POS information(e.g.,adjectives,adverbs,nouns,or verbs)in different contexts.Moreover,existing models is based on recurrent neural networks(RNN)and its variants.It is assumed that the state of the current moment is only related to the state of the previous moment and the input of the current moment.Thus,these models cannot explicitly model the dependencies among words in a sentence.As a result,these models hardly identify both the tenor and the vehicle when their distance is far away.To relief these problems,we propose a novel neural network framework.Firstly,we employ the explicit POS information integrating technology to combine word embeddings and POS embeddings to enrich the word representation,which could help the model to more accurately distinguish the certain semantics of the polysemic words and reduce the interference of them.Then,we introduce the self-attention mechanism to help the model explicitly model dependencies between arbitrary two words in a sentence,which could relief the global dependency problem.Finally,we have made detailed model design for the two subtasks of the simile recognition task based on our framework.In order to verify the effectiveness of the proposed model,we conducted a lot of experiments on the dataset provided by Liu.Our experimental results show that the proposed model is significantly better than the previous state-of-the-art methods.Our ablation studies have shown that both the explicit POS integrating technology and the self-attention mechanism are very effective.Specifically,the explicit POS integrating tends to improve the recall rate,while the self-attention mechanism tends to improve the precision rate in the simile sentence classification task.In the simile component extraction task,explicit POS integrating can improve both the recall rate and precision rate,while the self-attention mechanism tends to improve the recall rate.
Keywords/Search Tags:Simile Recognition, POS, Explicit POS Integrating Technology, Self-attention Mechanism
PDF Full Text Request
Related items