Font Size: a A A

Research On New Event Detection Methods For Mongolian News

Posted on:2021-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:S J WangFull Text:PDF
GTID:2428330620976438Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous increase of Mongolian network resources,the need to detect new events in Mongolian information and to keep up to date with the latest information in huge information sources is increasing.However,the research on the new event detection method in Mongolian at home and abroad is still in its infancy,and further research is urgently needed.In the new event detection in the field of Mongolian news,optimizing the presentation of news content and making full use of news corpus information are two core issues.The quality of the method directly affects the final detection results.This article focuses on the two core issues of optimizing Mongolian news content representation and making full use of corpus information,and studies new event detection methods in Mongolian news.The specific research content and innovations are as follows:First,in order to optimize the representation of news content,this thesis proposes a text representation method based on the vector space model of feature word weight optimization.Based on the Vector Space Model(VSM),this thesis improves the Term Frequency-Inverse Document Frequency(TF-IDF)algorithm,according to the characteristics of the news,special weighting is applied to the headlines,the first paragraph and the feature words appearing in the first sentence of each paragraph,and the statistical method is used to optimize the weight coefficient of the feature words of different named entities in different news categories,according to the characteristics of the new event detection task,class frequency variance is used to optimize the weight coefficients of feature words with different distributions in different news categories.The experimental results show that,compared with the traditional VSM model,this method has a certain improvement in system performance,and the standardization cost is reduced by 6.42%.Secondly,in view of the problem of VSM semantic loss and insufficient dimensionality reduction ability,this thesis proposes a method of vector feature fusion based on feature word weight optimization of VSM and Latent Dirichlet Allocation(LDA).This thesis introduces the LDA topic model,extracts the vector distribution of text in the hidden topic space,mines the latent semantic information in the text content,and applies VSM based on feature word weight optimization and LDA to the vector feature fusion method for Mongolian new event detection.Experimental results show that,compared with the traditional VSM model,this method further improves the system performance and reduces the standardization cost by 9.86%.Finally,in view of the difficulty that traditional new event detection systems cannot effectively distinguish different events with similar content,this thesis proposes a new event detection method based on the fusion of news elements.This thesis introduces deep learning technology and uses the attention mechanism combined with a Bidiectional long short term memory neural network and conditional random field(Attention + BiLSTM + CRF)neural network model to extract news elements,including time,place,subject and object.The fusion of news content similarity and news element similarity is used as the final similarity to detect new events in Mongolian.The experimental results show that compared with the traditional VSM model,the system performance has been further improved,and the standardization cost has been reduced by 10.95%.
Keywords/Search Tags:Mongolian, New event detection, Vector feature fusion, News elements, Attention mechanism
PDF Full Text Request
Related items