Representation Learning And Dependency Syntax For Text Summarization

Posted on:2021-04-21

Degree:Doctor

Type:Dissertation

Country:China

Candidate:W F Liu

Full Text:PDF

GTID:1368330602966034

Subject:Network and network resource management

Abstract/Summary:

PDF Full Text Request

Text summarization is a key technology in natural language processing(NLP).With explosive growth of text data in recent years,how to obtain its meaning quickly from massive information has received more and more attention.According to the specific processing methods,text summarization has two main paradigms: extractive summarization and abstractive summarization.Extractive summarization can select important sentences as the summary of the document.And the abstractive method mainly obtains summarization by generating and rewriting,it is similar to the Brain Knowledge Extraction of humans.However,extractive summarization often faces small coverage or one-sidedness,and cannot express the overall meaning of the document well.Furthermore,the content generated by current abstractive summarization is often subject to a series of problems such as poor readability,data redundancy,and semantic deviations,it cannot truly express the semantics of the document.To solve these problems,this paper starts with the study of word embeddings,then studies the syntactic structure of sentences and the attention mechanism in documents,finally implements a sentence-level summarization method and a hybrid extractive-abstractive method of summarization.The main contributions are summarized as follows:(1)A novel fine-grained word embeddings of text representation is proposed.Representation Learning is one of basic research in natural language processing or related ones.Aiming at the characteristics of text summarization,this paper researches a combination of feature information such as part-of-speech and position,then constructs a new,fine-grained,more expressive word embeddings of representation method,and then combines a two-dimensional table of word embeddings,which can reduce the size of the word embeddings lookup-table and improve query efficiency.Experiments show this method proposed has better semantic representation capabilities.(2)A novel method for comparing sentence-level similarity based on word embedding and dependent syntactic structures is proposed.Sentence is a basic processing unit of summarization.A meaningful sentence must conform to the syntactic structure of the corresponding language,so it is of great significance to incorporate the syntactic structure when we compare it with the relevant sentences.This paper studies the dependency relationship of words in a sentence,constructs a dependency syntax tree by the dependent syntactic analysis of arc-transformation,and combines the dependency syntactic relationships of words in a sentence to divide into different syntactic components(such as subject module,predicate module,object module,etc.).Followed by preprocessing,passive flipping,normalizing syntactic component blocks,etc.,this paper utilizes the attention mechanism to construct syntactic block embeddings.According to the constructed block embeddings,this paper splices and combines them into sentence-level embeddings.Experiments show that the sentence-level embeddings constructed by this method has a good representation effect.(3)A sentence-level summarization method based on dependency syntax and TreeLSTM is proposed.Based on the previous two parts,the input sentences can be divided into different syntactic blocks.The "hard alignment" mechanism is utilized between the input and output,and the "soft alignment" attention mechanism is used in the inner-syntactic modules.Those parameters are obtained by training Tree-LSTM network.This paper finally constructs a sentence-level summarization model.The dependency syntactic tree is used to ensure the syntactic relationship and readability of the generated sentence.The "hard alignment" mechanism can prevent the syntactic structural components of the long sentences from shifting,and the "soft alignment" mechanism can increase the flexibility of new words generated in syntactic blocks.Finally,the feasibility of this method is verified by experiments.(4)A novel,mixed extractive and abstractive document summarization model is proposed.To address the problems in document-level summarization and fully combine the advantages of the two abstract methods,this paper proposes a two-stage,mixed extractive and abstractive document summarization model.The first phase is to utilize sentence similarities matrix or "pseudo-title" to extract some important sentences from a document.This procedure fully considers the display features(such as sentence position,paragraph position and so on)to extract coarse-grained sentences,concerns the difference of sentences,and selects the most important ones in the document.The second stage is abstractive summarization.The extracted sentences are recombined and rewritten to generate new sentences using beam search algorithms.The optimal result is used as the "pseudo-titles" for the next round.The first and second steps are performed cyclically until the optimal "pseudo title" is obtained,and the final "pseudo title" is used as the summarization of the document.Extensive experiments have been performed on the corresponding English and Chinese data sets,the results show this method can obtain better summarization.

Keywords/Search Tags:

Representation Learning, Text Summarization, Word Embeddings, Dependency Syntax, Attention Mechanism

PDF Full Text Request

Related items

1	Jointly Learning Chinese Word Embeddings With Heterogeneous Morphemes
2	A Research Of Document Representation And Bilingual Word Embeddings
3	Research Of Sentiment Classification Based On Attention Word Embeddings
4	Research On Thai Dependency Syntax Analysis Method Based On Cross-Language Transfer Learning
5	Research On Text Summarization Generation Method Based On Whole Word Attention
6	Research On Text Generative Summarization Method Based On Attention Mechanism
7	The Study And Application Of Text Embeddings With Deep Learning Technique
8	Research On Automatic Text Summarization Based On Self-Attention Mechanism
9	Research On Deep Learning Method Based On Word Vector Representation In Text Classification
10	Research And Application Of Named Entity Recognition Method For Dialogue Domain