Research On Tibetan Text Summarization Method

Posted on:2024-02-07

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Wang

Full Text:PDF

GTID:2555306926963929

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the in-depth development of information globalization,the application of text summarization technology is no longer limited to high-resource languages such as English and Mandarin Chinese.How to build a high-performance text summarization system in a low-resource environment has become a new research hotspot and difficult issue.Tibetan is one of the minority languages in my country,and some people in Bhutan,India,Nepal and Pakistan also speak Tibetan,a total of about 8 million people speak Tibetan.The development of Tibetan informatization is very important.However,Tibetan informatization started relatively late,and there is currently no effective Tibetan text summarization system;secondly,the wave of intelligence triggered by deep learning has swept the world.In order for computers to accurately Accurate understanding of tasks often requires a large amount of data for training,but Tibetan,as a low-resource language,currently lacks large-scale Tibetan data sets;thirdly,with the increasing abundance of network information,people can no longer be satisfied with just Searching in the same language,the cross-language capability of summarization systems has attracted more and more attention.However,the research on Tibetan cross-language summarization is still in its infancy.These are the problems that the current Tibetan abstract system is facing,so it is of great significance to study the Tibetan abstract system.This paper conducts related research on the Tibetan abstract system.The main innovations of this paper are:(1)We construct 20,000 Tibetan news-headline summaries as a test set.In view of the current lack of public Tibetan text summarization evaluation datasets,we artificially constructed a data set of 1,000 Tibetan text summaries and keyword information corresponding to more than 3,500 articles to assist in the evaluation of Tibetan text summarization systems.The quality of the final abstract is guaranteed by cleaning and scoring the articles.The experimental results show that the Tibetan text dataset we constructed can describe the key information of the article accurately and without repetition,and can be used to evaluate the Tibetan text abstract system.We also constructed a training set of 20000 Tibetan news-headlines.(2)We propose a Tibetan multi-text summarization model based on Improved TextRank.In order to solve the problem that traditional k-means is not strong in text clustering,we adopt two-stage clustering strategy,and use spectral clustering to gather more relevant topics.Then,for the problem that traditional TextRank treats all sentence nodes equally we change the random jump probability of different sentence nodes by fusing topic features,so that sentences more relevant to the topic have a higher probability of being selected by jumping.The experimental results show that our model has achieved 32.4%on ROUGE-L,which is 17.2%higher than the traditional baseline model.(3)In view of the current lack of research on Tibetan cross-language summarization,we propose an end-to-end Tibetan-Chinese cross-language summarization model,which improves the problem of error propagation accumulation in traditional pipeline-based cross-language summarization models.For the lack of Tibetan-Chinese cross-language summarization datasets,we use a back-translation strategy to ensure the quality of the datasets.Through the inductive transfer mechanism of multi-task learning,the target task is disassembled into monolingual summarization task and multilingual summarization task to improve the generalization performance of the model.

Keywords/Search Tags:

Tibetan text summarization, Tibetan-Chinese cross-language summarization, Tibetan text summarization dataset

PDF Full Text Request

Related items

1	Research On Key Technologies For Tibetan Abstractive Text Summarization
2	Cross-lingual Text Summarization Model Research Based On Deep Learning
3	Research And Application Of Tibetan Pre-training Language Model Based On BERT
4	Research On Grammar-aware English Text Summarization Based On Deep Learning
5	Case Studies On The Effects Of Text Summarization On Argumentation Writing Qualities Of EFL Learners At Different Proficiency Levels
6	Movie Summarization Based On Subjective And Objective Features
7	Text Analysis Of Speech Synthesis Based On Statistical Parameters Of Tibetan Language In Specific Fields
8	Research On Vietnamese-Chinese Low-resource Cross-language Summarization Method Based On Word-level Key Information Guidanc
9	Dialogue Analysis And Automatic Summarization Of Business Dialogues
10	The Research And Implementation Of English Automatic Summarization