Font Size: a A A

COVID-19 Public Opinion Analysis Based On Crawler Technology And Text Clustering

Posted on:2022-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z J FangFull Text:PDF
GTID:2494306737953309Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In the past few decades,public opinion has almost nothing to do with people’s lives.In recent years,there have been several major emergencies of infectious diseases,such as the H7N9 avian influenza epidemic in 2017,the H1N1 influenza epidemic in 2019 and Ebola outbreak,and the COVID-19 epidemic in 2020,some Weibo related to it have been hotly searched,and news reports on major online platforms have also attracted the attention and heated discussions of the majority of netizens.Weibo public opinion has gradually become everyone’s daily routine.Regarding the public opinion generated by the epidemic,the discussion among netizens is very hot,with a maximum of tens of millions of Weibo per day.With the continuous increase in the number of confirmed cases,related news reports and topics on the Internet have also exploded.In the data age,the network public opinion dissemination mechanism is complicated and the amount of information is large.The analysis and accurate prediction of public opinion will help the government to fine-tune the governance.Therefore,the use of statistics and data mining algorithms to accurately and effectively identify hot topics and analyze the dynamic trends of netizens’ emotions is of great significance to the government’s prevention and control work.The main content of this article’s public opinion analysis of the COVID-19 epidemic includes data acquisition,data processing,text sentiment analysis and cluster analysis,time and geographic location analysis.Firstly,a crawler project was built based on the Scrapy crawler framework,using "2019-n Co V","COVID-19","pneumonia" and "epidemic" as the keywords.Use Python to obtain a large number of Weibo epidemic text data in 34 provinces,municipalities and autonomous regions in China from January 1,2020 to March 31,2020.Then proceed with data preprocessing based on the text characteristics of Weibo,add epidemic-related custom dictionaries and stop dictionaries for precise word segmentation,which provides effective support for the subsequent clustering analysis.Then use the TF-IDF algorithm to extract feature words,build a vector space model and use the LSA method to reduce the dimensionality,find the optimal number of clusters using the elbow method,and use the K-Means clustering algorithm to cluster the text data.Then use word cloud analysis and Snow NLP sentiment analysis model to visualize,draw a dynamic change map of emotions and a trend map of newly diagnosed people,and use a standard map service system to draw a distribution map of emotions in various regions of the country and a distribution map of newly diagnosed people.Finally,a comparative analysis of official news reports and experimental results are combined.The experimental results show that the hot topics discussed by netizens include the epidemic,prevention and control,Wuhan,hospitals,resumption of work and production,etc.The emotional dynamics of netizens in the epidemic can be roughly divided into four stages: from the panic period from January 1 to January 19 to the panic period from January 19 to February 15 and then to February 15.Day-the slow recovery period from March 10 th,and finally the optimistic and stable period from March 10 th to March 31 st.At the same time,the lowest sentiment value is only 0.3,and the highest is more than 0.8.The level of emotional value is basically negatively correlated with the severity of the development of the epidemic.Finally,according to the sentiment value distribution maps in the three time periods and the distribution maps of the epidemic situation in each region,it is verified that the conclusion is basically consistent with the actual situation.Therefore,in the event of a major sudden infectious disease,relevant Weibo data generated in real time can be obtained,text clustering and sentiment analysis of Weibo in different geographical locations can be performed,and the information value contained therein can be mined,combined with sentiment distribution maps,Positioning and predicting the development of the disease in a specific infected area.
Keywords/Search Tags:COVID-19, Scrapy Crawler Framework, K-Means Text Clustering, Sentiment Analysis
PDF Full Text Request
Related items