Font Size: a A A

Research On Network Hot Keyword Monitoring Based On High-dimensional Statistical Process Modeling

Posted on:2024-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:S F ZhangFull Text:PDF
GTID:2558307103473054Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the Internet has gradually become an indispensable way for people to obtain knowledge,news and entertainment,and the Internet forum is a virtual network platform for people with common interests to discuss topics and share information,and is one of the main channels for people to disseminate information.In the forum,netizens can post or reply to other people’s posts to express their unique views.To a certain extent,the number of posts can reflect the public’s understanding and evaluation of certain social issues.Hot topics on the Internet have also become an important channel for governments and enterprises to understand social conditions and public opinion,and to supervise public opinion.At present,the popular hot keywords monitoring method is a hot keywords monitoring technology based on machine learning.Using the knowledge of machine learning,big data processing and manual analysis are integrated,and big data processing methods are used to quickly identify and judge hot keywords in the current public opinion dynamics.However,hot keywords monitoring technology based on machine learning is not applicable in some scenarios,such as when the number of keywords in a hot topic is uncertain or the keywords increase rate of the same hot topic at different hot times is different.Based on this,the thesis proposes two network hot keywords monitoring methods based on statistical process control for Weibo word frequency data.The main research work is summarized as follows:First of all,the thesis uses web crawler technology to crawl text data of microblog posts.The raw data crawled is often noisy,and these noises can affect the implementation of the monitoring scheme.In this regard,the thesis adopts data cleaning,participle,and removal of stop words to complete data preprocessing.The thesis studies the changes of public opinion from the perspective of all users,explores the change trend of word frequency of hot keywords,constructs a keywords frequency table and performs statistical analysis.Secondly,the hot keywords monitoring scheme based on Shewhart control chart due to the characteristics of large dimension and time-varying sample number of keywords word frequency data,the existing Shewhart control chart cannot be well monitored.Therefore,the thesis expands the Shewhart chart and constructs the FDRbased Shewhart control chart by using the false discovery rate(FDR)in the multivariate hypothesis test.The application of keywords word frequency data to the control chart proposed in the thesis and the two existing popular control charts show that the FDR-based Shewhart control chart proposed in the thesis is superior to other methods in terms of timeliness.Finally,since the Shewhart control chart has better monitoring performance for the large shift amplitude of hot keywords,considering the small shift amplitude of hot keywords,the thesis proposes a cumulative and control chart monitoring scheme based on FDR,and two methods approximating p-value are proposed in this scheme: simulation method and Markov chain method.Through simulation experiments,it is verified that the FDR-based cumulative and control charts are superior to the existing control charts regardless of the size of the sample dimension,the constant or timevarying sample size,and the high or low proportion of shift variables.The application of keywords word frequency data to FDR-based cumulative and control charts shows that FDR-based cumulative and control charts are superior in timeliness to existing control charts and more sensitive than the FDR-based Shewhart chart monitoring proposed earlier.Aiming at the monitoring of hot topics in the network,the thesis proposes two control chart monitoring methods.these two methods are two general methods,not only Weibo hot keywords data,but also suitable for monitoring data with the same data type in production,medicine,meteorology,and other fields.
Keywords/Search Tags:Hot Topics, Keyword Monitoring, Statistical Process Control, Shewhart Control Chart, Weighted Accumulation and Control Chart
PDF Full Text Request
Related items