Research On The Generation Of Tibetan News Abstracts Based On A Unified Model

Posted on:2021-05-27

Degree:Master

Type:Thesis

Country:China

Candidate:W Li

Full Text:PDF

GTID:2438330602998431

Subject:Computer Science and Technology

Abstract/Summary:

With the development of the Internet era,the information on the web presents an explosive growth,which is difficult for us to extract valuable information efficiently.Therefore,text summary technology emerges that could help people to summarize the main idea of the article from the lengthy news.It also could filter out redundant information,so as to improve the speed of browsing news.Text summarization is a research hotspot in the field of natural language processing and has attracted more and more attention of researchers.According to the implementation method,text summarization can be divided into two categories:extractive and abstractive summarization.Extractive approaches generate a summary by selecting and putting the sentences together from the original text.Abstractive approaches generate summaries from scratch with novel words and phrases by re-interpreted not copied from the source text.At present,the research of text summarization has made remarkable achievements in the field of Chinese and English.However,the methods of generating summary and evaluation in Tibetan are still relatively backward.It mainly through the artificial collected a small amount of corpus with unsupervised method to generate summary,which is lack of large-scale corpus.What’s more,there is no standard evaluation method.Moreover,the sequence-to-sequence summary model has not been applied in Tibetan,whether it performs well in both Chinese and English.This paper studies and analyzes the Tibetan news text summarization,and its main contents and innovations are as follows:1.in view of the lack of large-scale training corpus,non-standard evaluation methods and lack of reference results in Tibetan at present,50,000 Tibetan news was extracted as training corpus,and the results of K-means clustering and headlines were taken as reference abstracts.The traditional abstract method and understanding abstract method are applied to Tibetan,and the standard ROUGE method of text abstract evaluation is adopted to evaluate,and a reference baseline is given.2.in view of the Tibetan news text is too long to gradient disappeared with the explosion of the problems in the process of training,the joint model,combined extraction method and the generated method,first use the removable method derived from the article can say the sentences of the original,remove redundant information,shorten the length of the article,and then understand type method is used to generate the.The experimental results show that the ROUGE-1 value is improved by about 2%compared with the traditional abstract ROUGE-1 value.3.To solve the problem of the lack of labeled training expectation in the first stage of the joint model,TextRank algorithm was used to label extracted training corpus and train extracted neural network model.In the second stage,pointer mechanism and overwriting mechanism are introduced to solve the problem of semantic duplication in generating abstract.

Keywords/Search Tags:

Text abstract, ROUGE, TextRank, Pointer mechanism, Seq2Seq, Attentional mechanism

Related items

1	Research And Application Of Abstract Method Of Chinese Web Text Based On Seq2Seq Framework
2	Research And Implementation Of News Web Abstract Algorithm
3	The Research On Chinese Automatic Abstract Generation Technology Based On Deep Learning
4	Seq2seq Attention:Super Long Chinese Text Summarization Model
5	Research And Implementation Of Abstract Automatic Generation Algorithm Based On Gensim
6	Research On Automatically Generated Text Summary Evaluation Based On Deep Learning
7	Improved Attentional Seq2seq With Policy Gradient For Text Summarization
8	Research And System Implementation Of English Text Simplification Based On Seq2Seq
9	Research On Automatic Text Summarization Based On TextRank
10	Research On Chinese Automatic Text Summarization Based On Deep Learning