Font Size: a A A

Research On Key Issues In Semantic Feature Extraction And Understanding Of English Text

Posted on:2021-03-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:J LiFull Text:PDF
GTID:1365330647961880Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Text semantic representation is the foundation of many natural language processing applications.Its purpose is to map the unstructured vocabulary of the text to the appropriate vector space for further computation and processing by the computer.Text semantic representation provides a foundation for text comprehension.This paper constructs a text semantic representation model by mining deep semantic information in text representation,and designs a text comprehension method based on this model.At present,most methods of text semantic feature extraction use neural network language models to generate text representation.These models use statistical word frequency or probability distribution of words in the text,and express the word and word frequency or probability distribution in the form of semantic space to construct a text semantic representation model.However,when text semantic understanding is based on these traditional text semantic representation models,words that seem to have semantic similarity will have different meanings from different perspectives.At the same time,because the semantics of words in English texts are affected by specific contexts,it is challenging to understand them accurately.The traditional English text semantic understanding method does not design the semantic understanding method based on the semantic features of the text concept,and there is a problem of poor accuracy in understanding the deep semantics of the English text.This article studies from two parts.First,starting with the current basic methods and theories of text semantic feature extraction,a conceptualized hybrid feature keyword extraction method is proposed,focusing on the analysis of key semantic terms and contextual concepts in the extraction of conceptual semantic features Relationship,and tap the attribute relationship between terms and their concepts.At the same time,on the basis of mining key terminology of text,the semantic relationship of terminology in the text is classified and extracted,and a rich text semantic feature representation is constructed combining key terms,concepts and the semantic relationship between them.In the second part,the text semantic features extracted from the previous part are combined to design a text semantic understanding method.In the understanding method,emphasis is placed on designing an understanding model for a specific semantic understanding task,and combining the attention mechanism to improve the model.The validity of the model is verified on the relevant data set.Specifically,the main work of this article can be introduced as follows:Firstly,this paper studies a concept-based mixed feature keyword extraction method,with emphasis on the extraction of keywords or phrases and their concepts in English.A text keyword extraction method combined with Text Rank algorithm is proposed.This method obtains the text representation by jointly training text word vectors and paragraph vectors.The Text Rank algorithm is introduced into clustering keyword or phrase nodes,and the jump probability between nodes is introduced.The matrix learns the node weight scores,and finally generates a keyword or phrase score ranking through the scoring function.The results show that this method can obtain more accurate keywords or phrases with lower computational complexity on multiple public data sets.At the same time,we tested short text datasets(such as Twitter dataset)and long text datasets(such as "Southern Weekend" article dataset).The experimental results show that our method has achieved good accuracy in the extraction of short text keywords or phrases,and has competitiveness in the extraction of long text keywords.Secondly,this paper proposes a double convolutional neural network relational extraction model combining knowledge base attention and word embedding.This model enriches semantic supervision information by introducing knowledge base attention.Meanwhile,two independent convolutional neural networks are used in this paper to learn the real word vector in the text and the supervision information obtained in the knowledge base,and the hidden layer output of the two convolutional neural networks is splined in the full connection layer.Through this process,the model can not only obtain entity representation,but also obtain more comprehensive inter-entity relationship representation based on rich knowledge base background.Compared with the existing related methods,our model performs better in semantic dependency extraction task and sentence relationship classification task.Thirdly,a text comprehension method combining text conceptualization and attention embedding is proposed.Aiming at the problem of English short articles with less vocabulary and less semantics,this method constructs attention encoder based on the conceptualized text representation in the knowledge base.Specifically,for each English essay,key entity words are extracted and conceptualized.Conceptualization is realized by co-occurrence of entity words and concept words.Meanwhile,other concepts and relationships related to text concepts are acquired by relying on knowledge base,and concepts are mapped to low-dimensional vector Spaces to obtain conceptualized spatial codes.Finally,the text comprehension method is designed by combining concept space and attention coding space.We conducted information retrieval experiments on New York Times and Twitter datasets,respectively,and the results performed better than current methods.At the same time,we designed three evaluation indicators to conduct opinionretrieval experiments on the WWW2015 and Coling2016 datasets,and all indicators have good performance.Fourth,this paper proposes a multi-granularity layered feature of the question-answer understanding method.This method will question and answer to understand the text and the problem of semantic feature extraction were divided into two parts,respectively is the language of the traditional model and depth matching model,and the two part of semantic feature extraction by combining build similarity matrix,at the same time,the design of the three different models to study of similarity matrix,respectively connected similarity matrix characteristics,independent learning and the similarity matrix similarity matrix.This method can learn more text features from multiple perspectives and achieve better results in QA comprehension tasks.Through the experimental testing of the open data set Wiki QA,multi-granularity feature learning methods are added to improve the accuracy of answers in question and answer understanding tasks.
Keywords/Search Tags:Semantic features, Feature extraction, Text understanding
PDF Full Text Request
Related items