Font Size: a A A

Research On Text Summarization Algorithms For Sports Game

Posted on:2024-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:J A WangFull Text:PDF
GTID:2557306941964619Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As people step into the information era,the massive growth of text data intensifies the difficulty and challenge of extracting core information.Text summarization provides a solution to this situation,aiming to produce concise summaries of lengthy text data to help people quickly grasp the essential information.In sports games,commentators often provide real-time commentary on the entire event for audiences,which tends to be lengthy and dispersed in information.After the game,news websites or social platforms may publish related news reports to facilitate people to quickly obtain sports information,which are concise and focused.Therefore,the research on text summarization for sports games,also known as sports game summarization,was born,aiming to generate corresponding news reports from the full-text commentary documents of sports games.This task can provide timely news reports for the sports industry,facilitate information dissemination in sports communities,and enable internet users to quickly access core event information.In recent years,research on sports game summarization is limited due to the scale and quality of datasets,and it is still unable to effectively extract core information from lengthy and colloquial commentary documents.Moreover,existing research has not considered the knowledge gap between commentary documents and news reports,where news reports may contain additional background knowledge.Furthermore,all current research on sports game summarization is Chinese-oriented,while the world we live in is multilingual.Promoting sports game summarization research in other languages using Chinese sports game summarization technology is also an unexplored direction.In response to the current research situation,this paper discusses the sports game summarization task in four main aspects:(1)This paper constructs the first large-scale and high-quality sports game summarization dataset,i.e.,SGSum.Existing datasets are either small in scale or low in quality.Therefore,this paper constructs a sports game summarization dataset that satisfies both large-scale and high-quality requirements.Specifically,we collect data from all mainstream soccer games between 2012 and 2020 from online resources and implement a strict manual cleaning process to obtain high-quality samples.Eventually,SGSum includes 7,854 sports game summarization samples,making it the largest known dataset and the only sports game summarization dataset collected through a manual cleaning process.(2)This paper proposes a general sports game summarization model based on re-ranking.To improve the readability of generated news reports,the model introduces a re-ranking module to filter the news sentences generated by the model.In the filtering process,the method selects high-quality news sentence sets based on informativeness,fluency as well as redundancy,to form the news reports.(3)This paper presents a knowledge-enhanced sports game summarization model.To bridge the knowledge gap between news reports and commentary documents,this paper constructs a sports knowledge base containing extensive team and player information.Then,during the generation of sports news reports,named entity recognition technology is used to extract the characters and organizations contained in the commentaries,and then we link them to the relevant entities in the knowledge base,retrieve corresponding knowledge,and finally incorporate their representation into the model’s computation process.(4)This paper proposes a multilingual sports game summarization model that effectively leverages existing high-resource language sports game summarization data to enhance the sports game summarization capability in low-resource languages.Specifically,we design various pre-training objectives to simultaneously improve the model’s cross-language and summarization abilities from a pre-training perspective.The final model has excellent cross-language transferability and can generalize the sports game summarization capability learned from high-resource languages to low-resource languages.Extensive experiments on the SGSum and SportsSum datasets demonstrate that the proposed methods can generate corresponding sports news reports for the given commentary text.Compared with many types of baseline models,the proposed methods achieve better results in terms of ROUGE and BERTScore metrics.
Keywords/Search Tags:Text Summarization, Sports Game Summarization, Text Generation
PDF Full Text Request
Related items