Font Size: a A A

Research On Tibetan News Headlines Generation Based On Neural Network

Posted on:2022-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:W K Z ZeFull Text:PDF
GTID:2518306482973389Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet,the scale of online Tibetan news resources is also increasing.News headlines are the main means for readers to screen news.Readers can quickly identify news content that they are interested in and block news content that they are not interested in.Readers may encounter such a scenario in the process of browsing news: when they enter the browsing because they are interested in news headlines,they find that the news content has nothing to do with the headlines.At present,in addition to the high quality of news headlines from official media,news from other sources on the Internet has phenomena such as no headline,wrong title,or exaggerated deviation from news content.In response to the above problems,this paper uses natural language processing technology to generate an accurate,smooth and concise headline for news content.The main research work on title generation is concentrated in the fields of Chinese and English,and there are very few researches on Tibetan title generation.At present,there is only work related to the generation of automatic text summaries in Tibetan.This article studies the key theoretical and technical issues involved in the generation of Tibetan news headlines,the innovations are as follows:(1)Compared with the public corpus in the field of Chinese and English title generation,Tibetan lacks a large-scale corpus that can be used for title generation tasks,and there is no standardized evaluation method.In this paper,30,300 pairs of data were extracted from the Tibetan text corpus established by Liu Huidan and others,and more than 140,000 pairs of data were obtained from Tibetan news websites through crawler technology as an expanded corpus.Combining the characteristics of the Tibetan language,the data was preprocessed,and on this basis,a corpus of more than 170,000 pairs was constructed through word segmentation and sentence segmentation.The task of title generation belongs to the category of text generation,so this article uses the currently more commonly used ROUGE evaluation system.(2)This paper constructs a baseline model of title generation based on the TextRank algorithm,which extracts sentences with high similarity between sentences from the text as output.The baseline model of previous related studies directly adopted the title of the first sentence composition in the text,which was unreasonable.This paper applies TextRank to title generation as a baseline model,and builds a baseline model with good performance and strong robustness.(3)This paper constructs a sequence-to-sequence model based on neural network,uses LSTM and its variants as encoders and decoders,and attention mechanism to capture important semantic information of the text,and solves the long-term dependence problem caused by the long news text..Compared with the baseline model,the effect of the model proposed in this paper has been greatly improved.Compared with other models,the results show that the BiLSTM model based on the attention mechanism performs best with a ROUGE-1 value of 33.76%,which can generate high-quality news with news Tibetan news headlines in the framework of the domain model.
Keywords/Search Tags:Headline Generation, Tibetan News Headline Generation, Neural Network, Attention Mechanism
PDF Full Text Request
Related items