Font Size: a A A

Research And Application Of Document Semantic Representation Method

Posted on:2020-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:J B YuanFull Text:PDF
GTID:2428330605466663Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Document expression is an important work in the field of natural language processing.At present,the main research method is distributed representation based on the context semantics of the document.Among them,doc2vec is extended by word2vec,which is a relatively successful document semantic representation model.However,doc2vec only predicts the target word by randomly selecting some words in the context window when learning,and lacks effective acquisition of context semantics.This paper mainly combines the hierarchical attention mechanism to optimize the doc2vec,and applies it to the semantic representation and search of massive scientific documents.The specific research work is as follows:(1)The research proposes a document representation model based on doc2vec model and hierarchical attention mechanism to optimize the acquisition of contextual semantic information in document representation learning.The model introduces a layered attention mechanism in the target word prediction process,and represents the document as a three-layer structure of documents,paragraphs and words,the attention weights are reflected between the paragraphs and words.The model enriches the source of contextual semantic information for target word prediction and improves the accuracy of document representation.The experimental results show that the new model combined with the hierarchical attention mechanism shows better results than the word2vec and doc2vec in the document sentiment classification task.(2)On the basis of(1),a semi-supervised document representation learning method is proposed for specific documents,and the specific information in documents is used to improve the accuracy of the document representation.The model uses the specific information such as subject paragraphs and subject words in the document to actively adjust its attention weights in a semi-supervised manner and adjust them to relatively high values.Through the use of specific information in the document,the problem of insufficient semantic acquisition of specific information by other models is improved.Using a technical document approximation query experiment,it is verified that the improved model has a better effect when dealing with documents containing specific information.(3)Based on the above research results,using the approximate nearest neighbor algorithm to develop and implement the semantic search engine of scientific documents.For the textual description of the technology,product and method input by the user,the engine intelligently searches for the most relevant technical documents,scientific and technological achievements and other resources to help users obtain technical information and corresponding expert information.In summary,the paper mainly studies the document semantic representation learning method,and applies the research results to the ZuoChuangZhiTui precision matching platform,dedicated to the accurate recommendation of enterprise demand and the transformation of scientific and technological achievements.
Keywords/Search Tags:natural language processing, neural network language model, Doc2vec, document embedding, document vector
PDF Full Text Request
Related items