Font Size: a A A

Research On Chinese-Vietnamese Cross-language Text Summarization Method Based On Transformer Structur

Posted on:2024-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:B W ChuFull Text:PDF
GTID:2568307109488114Subject:Artificial intelligence
Abstract/Summary:PDF Full Text Request
Chinese-Vietnamese cross-language text summarization is a technology that generates summaries in another language by inputting Vietnamese or Chinese text through a computer.As the economic,cultural,and political exchanges between China and Vietnam continue to deepen,the Chinese-Vietnamese cross-language text summarization technology can quickly help the people of the two countries obtain important content in the text.At present,there are few studies on cross-language text summarization in Chinese-Vietnamese minor languages.Due to the lack of high-quality parallel corpora,the generated Chinese-Vietnamese cross-language summaries will be factually inaccurate,and as the length of the text increases,the summaries will also An important content loss problem occurred.Therefore,this paper is mainly oriented to the research of Chinese-Vietnamese cross-language text summarization methods,and gradually carried out research on two data sets with different average lengths.The main contributions and innovations are summarized as follows:(1)Chinese-Vietnamese cross-language text summary data corpus constructionConstructing a high-quality Chinese-Vietnamese cross-language text summarization corpus is an important basis for subsequent work.Using manual construction of bilingual aligned datasets not only consumes a lot of time and energy,but also requires researchers to be proficient in at least two languages,Chinese and Vietnamese.It is very difficult to implement.Therefore,by setting rules and using machines to construct a ChineseVietnamese cross-language text summarization dataset is currently the most effective way to solve the scarcity of bilingual corpora.Divide short text and long text data sets to provide support for follow-up research.(2)Cross-language summarisation methods based on dual-attention decoding networksMost of the existing technologies use the encoder-decoder structure to complete the task of cross-language text summarization.However,under the condition of low resources in Chinese and Vietnamese,the amount of data contacted by the model is not sufficient,which leads to the fact that the generated summaries are inaccurate..Therefore,to address this problem,this paper proposes a cross-lingual summarization method based on a dual attention decoding network.In the process of summary generation,the attention mechanism of the decoder has the greatest impact on the quality of the summary.Based on the attention mechanism of the Transformer decoder,we added a refiner module to check and refine the summary output by the attention mechanism.,At the same time,we also add pre-trained word vectors to the encoding end to improve the representation ability of the encoding end for Vietnamese,and use these two measures to improve the accuracy of hitting the correct entity.Experiments on the Chinese-Vietnamese crosslanguage summarization dataset show that the method outperforms the baseline method,and the generated summaries are more accurate and fluent.(3)A multi-task based approach to cross-lingual long text summarisation for the Chinese and VietnameseMost of the existing technologies use the encoder-decoder structure to complete the task of cross-language text summarization.However,under the condition of low resources in Chinese and Vietnamese,the amount of data contacted by the model is not sufficient,which leads to the fact that the generated summaries are inaccurate..Therefore,to address this problem,this paper proposes a cross-lingual summarization method based on a dual attention decoding network.In the process of summary generation,the attention mechanism of the decoder has the greatest impact on the quality of the summary.Based on the attention mechanism of the Transformer decoder,we added a refiner module to check and refine the summary output by the attention mechanism.,At the same time,we also add pre-trained word vectors to the encoding end to improve the representation ability of the encoding end for Vietnamese,and use these two measures to improve the accuracy of hitting the correct entity.Experiments on the Chinese-Vietnamese crosslanguage summarization dataset show that the method is superior to the baseline method,and the generated summaries are more accurate and fluent.(4)Design and Implementation of a Prototype System for Chinese-Vietnamese Cross-language Text SummarizationBased on the above research,this paper designs a visualization platform for the Han Yue cross-lingual text summarization system,providing multiple types of summarization functions such as short text type,long text type and document type,etc.The user chooses to input different data types of text,and the system automatically matches the text summarization model of the adapted type and outputs the corresponding summary for the user.This chapter introduces the implementation process of the prototype system,including the overall system architecture,environment requirements,model building and system functions.
Keywords/Search Tags:Chinese-Vietnamese cross-language, text summarization, short text, long text
PDF Full Text Request
Related items