Font Size: a A A

Design And Implementation Of Named Entity Recognition System Based On Network Text Of Winter Olympic Games

Posted on:2022-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z H WangFull Text:PDF
GTID:2507306728460064Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the arrival of the 2022 Winter Olympics,the text data of the Winter Olympics news network has increased dramatically.The extraction and visualization of named entities from the online text of the Winter Olympics are of great significance to the research on the related work of the Winter Olympics.In this experiment,you need to obtain the web text from the media website through the theme crawler.The obtained web text data is automatically annotated by Han LP,and then manually modified to make up the data set for the cost experiment.Aiming at the identification of named entities in the Internet texts of the Winter Olympics,an improved method is proposed based on the BERT-BiLSTM-CRF named entity recognition model.The system mainly includes three major functions: data collection,named entity recognition,and data visualization.The theme crawler technology is used in data collection to collect the Winter Olympics web texts from media websites such as Toutiao,which not only provides data support for subsequent analysis,but also completes the summary of the Winter Olympics web texts in the media platform.In name recognition,first,introduce the BERT word vector,calculate the dynamic word vector according to the context of the word,which is different from the static vector representation of word2 vec,and obtain the vector representation by extracting rich grammatical and semantic features;second,single Although the Xiang’s LSTM network structure has increased the ability to process long-sequence text semantics,it cannot obtain feature information based on the following information,so the bidirectional BiLSTM network structure is used to obtain feature information;third,through Attention Key words are given greater weight,which effectively solves the problem of sparse semantics at the front end of long-sequence text;fourth,CRF adds constraints to predicted tags to prevent confusion in tag sequences.In the visualization,the echarts visualization tool is used to analyze and draw the recognition results from two aspects of time and space into a graph,combined with the Django framework to design a web interface for display,so as to intuitively understand the frequency and space of the events of the Winter Olympics at different times regional links.In the time dimension,the identified time and date are made into a calendar chart,and time is used as a support to understand the frequency of events at different time nodes.In the spatial dimension,the identified place names are made into a stream map,and the geographical names are used as support for geostatistical analysis.The design of this system provides the function of identifying named entities in the online text of the Winter Olympics for the 2022 Winter Olympics.The two types of entities,time and location,can be drawn into maps to help relevant staff better dig into the 2022 Winter Olympics.Information.
Keywords/Search Tags:2022 Winter Olympics, theme crawler, named entity recognition, BERT-BiLSTM-CRF, Attention
PDF Full Text Request
Related items