Research And Application Of Document Semantic Representation Method

Posted on:2020-10-27

Degree:Master

Type:Thesis

Country:China

Candidate:J B Yuan

Full Text:PDF

GTID:2428330605466663

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Document expression is an important work in the field of natural language processing.At present,the main research method is distributed representation based on the context semantics of the document.Among them,doc2vec is extended by word2vec,which is a relatively successful document semantic representation model.However,doc2vec only predicts the target word by randomly selecting some words in the context window when learning,and lacks effective acquisition of context semantics.This paper mainly combines the hierarchical attention mechanism to optimize the doc2vec,and applies it to the semantic representation and search of massive scientific documents.The specific research work is as follows:(1)The research proposes a document representation model based on doc2vec model and hierarchical attention mechanism to optimize the acquisition of contextual semantic information in document representation learning.The model introduces a layered attention mechanism in the target word prediction process,and represents the document as a three-layer structure of documents,paragraphs and words,the attention weights are reflected between the paragraphs and words.The model enriches the source of contextual semantic information for target word prediction and improves the accuracy of document representation.The experimental results show that the new model combined with the hierarchical attention mechanism shows better results than the word2vec and doc2vec in the document sentiment classification task.(2)On the basis of(1),a semi-supervised document representation learning method is proposed for specific documents,and the specific information in documents is used to improve the accuracy of the document representation.The model uses the specific information such as subject paragraphs and subject words in the document to actively adjust its attention weights in a semi-supervised manner and adjust them to relatively high values.Through the use of specific information in the document,the problem of insufficient semantic acquisition of specific information by other models is improved.Using a technical document approximation query experiment,it is verified that the improved model has a better effect when dealing with documents containing specific information.(3)Based on the above research results,using the approximate nearest neighbor algorithm to develop and implement the semantic search engine of scientific documents.For the textual description of the technology,product and method input by the user,the engine intelligently searches for the most relevant technical documents,scientific and technological achievements and other resources to help users obtain technical information and corresponding expert information.In summary,the paper mainly studies the document semantic representation learning method,and applies the research results to the ZuoChuangZhiTui precision matching platform,dedicated to the accurate recommendation of enterprise demand and the transformation of scientific and technological achievements.

Keywords/Search Tags:

natural language processing, neural network language model, Doc2vec, document embedding, document vector

PDF Full Text Request

Related items

1	Scientific Research Document Retrieval And Recommendation System Based On Doc2Vec
2	Research On Cross-language Document Sorting Learning Method Based On Bilingual Document Similarity
3	Submodularity in Natural Language Processing: Algorithms and Applications
4	Research On Automatic Answering Technique Of English Test
5	BERT-based Two-stage Long Document Retrieval Model Fused With Supplementary Information
6	Xml Technology Research And In The Water Information System Applications
7	Natural Language Processing-A Study Of Vectorization Of Chinese Words And Short Texts
8	Visual Analysis For Fast Understanding Of Document Collection
9	Research On Machine Learning For Natural Language Processing And Transmission
10	A Research On Abstract Summary Extraction Of Long Texts Based On BERT Model