Font Size: a A A

Research On Extractive Text Summarization Based On Maximal Marginal Relevance

Posted on:2020-09-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y J GuoFull Text:PDF
GTID:2428330596981787Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the advent of the big data era,users receive a large number of articles from news,media,email,etc.every day.If each article has a short summary,then users can use limited time and energy to select an interesting article to read.Obviously,it is unrealistic to write a summary of every article by human.Then,a system to automatically generate abstracts is needed,that is the automatic text summarization technology.Automatic text summarization is a sub-task of the natural language processing.Like other tasks of the natural language processing,it is necessary to analyze unstructured text data to choose keywords or key sentences as a summary to represent the content of the article.A high-quality abstract that includes the main content of the article with low redundancy and smooth statement.Extracting sentences from the original text as a summary does not have the grammatical errors,so,this paper chooses the method of extractive summarization.As a hot topic in the field of natural language processing in recent years,migration learning aims to solve the problem of small amount of data in a certain field or a certain language by transferring knowledge or model.With this technology,people can reduce the workload and save time.This paper applies migration learning to the automatic text summarization to extract summary in the target language with the help of feature migration.In this way,we only need to get the features of one language to extract summary of other language.This paper mainly studies the extractive summarization in two aspects.Firstly,this paper proposes a Text Summarization Based on Maximal Marginal Relevance(TSMMR)model,which calculating the similarity between sentences by the word embedding and the sentence embedding,scoring sentences by keywords and location information on the importance of sentences to obtain higher quality summarization.The model is applied to the 2018 Byte Cup generated article title task to conduct experiments to test the effectiveness.Experiment results show that the TSMMR's Rouge-L(37.78%)of the multiple sentence summaries is much higher than the traditional extractive text summary algorithm,CI(29.35%),Text Rank(34.15%)and MMR(31.09),indicating that combining word embedding or document embedding can improve the quality of text summary.Then,based on the TSMMR model,we propose a Cross Lingual Maximal Marginal Relevance(CLMMR)model to realize the migration of keywords features in different languages.The basic idea is to transfer the keywords features from the source language to the target language.Due to the difference of the two languages,the keywords features of the source language can't be directly used in the target language.In this paper,the keywords features of the source language and the target language are mapped to the common feature space by bilingual word vector alignment to realize the migration of keywords features between two languages.Finally,we score sentences by the method of TSMMR to extract the summary.The result of experiments on the cross-language datasets shows adding keywords features of other language can help extract summary from current language documents.
Keywords/Search Tags:Extractive Text Summarization, Word Embedding, Maximal Marginal Relevance, Transfer Learning, Cross language
PDF Full Text Request
Related items