Font Size: a A A

Researches On Sentiment Time Series For Anomaly Detection

Posted on:2022-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:J Y WuFull Text:PDF
GTID:2480306725981429Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Sentiment time series is generated by sentiment scores in chronological order aggregated by texts belonging to each time slice,which is an effective tool to transform text data into time series.Contributions on sentiment time series for anomaly detection mainly apply anomaly detection methods based on sentiment time series.With the analysis of changing patterns in the sentiment time series generated by texts,the anomaly events that raised users' sentimental changes can be mined.In most existing researches on sentiment time series for anomaly detection,on one hand,the generated time series is not accurate enough to reflect the changing patterns in users' sentiment.On the other hand,most of the existing methods detect anomaly points in sentiment time series in a simple way,which manually mark key time points related to real-life events based on short-term fluctuations or significant spikes.Aiming at the problems mentioned above,this thesis conducts relevant research on the anomaly detection in sentiment time series.Firstly,to solve the problem of inaccurate sentiment time series generation,a random sampling-based sentiment score calibration method is proposed.We randomly sample a subset from the grouped texts belonging to each time slice and estimate the distribution on global data based on subsampling values of evaluation indicators.Combined with the predicted results from the sentiment classifier,the calibrated sentiment scores can be obtained.We illustrate from a theoretical perspective that the sampling error of the evaluation indicators can be limited to a small range,especially for those extreme sentiment scores.The experimental results on simulated data and weibo realworld data set verify the effectiveness and robustness of the method for sentiment time series calibration.Secondly,due to the uncertainty of random sampling,the sampling results may fall into bad cases.In order to obtain more stable and reliable sentiment scores,the sentiment score calibration method based on deep clustering is proposed.High-dimensional representations of the texts are obtained based on sentiment classification task.After fine-tuning and compression,the vectors are divided into different clusters by deep clustering.Then the representative samples in each cluster are selected based on the distance measurement and form a sampling subset.Finally,the sentiment score calibration is carried out in the method based on subset sampling.In the experiment on the weibo topic data set constructed in this thesis,two sentiment calibration methods are compared,the results of which prove that the deep clustering-based method can reduce the uncertainty,and can maintain excellent performance as well.Finally,an anomaly detection method for sentiment time series based on saliency map is proposed.At first the original sentiment time series is divided based on overlapped sliding windows so that the target time points can be located in the middle of the window.Then the spectral residual method is applied to each window to obtain the saliency map,which enhances the anomaly degree of the points.The anomalies are identified by performing a comparison between the saliency map and its local mean value.The anomaly detection experiment on the weibo topic data set verifies that the method can effectively distinguish between anomalous points and non-anomaly points,and improves the accuracy of anomaly detection in sentiment time series.
Keywords/Search Tags:Sentiment Time Series, Microblogging Events, Time Series Anomaly Detection, Deep Clustering, Saliency Map
PDF Full Text Request
Related items