Font Size: a A A

Application Of Multiple Linear Regression And Rough Set Clustering In Epidemic Data Analysis

Posted on:2021-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:L Q LiuFull Text:PDF
GTID:2370330632451436Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In 2020,the novel coronavirus presented and became popular.From January 2020 to March 2020,Hubei Province is the most serious epidemic area in China.The new epidemic situation has a great impact on many industries in China,especially the education industry.Data mining is a kind of discipline,including a variety of data algorithms,such as clustering,forecasting and so on.The data mining algorithms used in this paper include multiple linear regression analysis algorithm,rough set attribute reduction algorithm,principal component analysis algorithm and K-means clustering analysis algorithm.The data sources of this paper include two parts.The first part is the statistical data of the official website of Hubei Provincial Health Commission.The second part is the sample data obtained from a questionnaire issued by an educational institution.The distribution of students' levels is uniform,and the distribution of the factors studied is also relatively uniform,so the data research is feasible.This paper mainly deals with the following three algorithms:(1)The epidemic data of Hubei Province were analyzed by using multiple linear regression algorithm;(2)By using rough set attribute reduction algorithm to analyze the influence factors of students' learning during the epidemic period;(3)In this paper,principal component analysis algorithm,rough set algorithm and K-means clustering algorithm are combined to use a comprehensive clustering analysis algorithm to cluster data.Among them,the third algorithm is a comprehensive algorithm proposed in this paper.The algorithm combines a variety of data mining algorithms,comprehensively uses the advantages of principal component reduction and rough set to solve uncertain problems,and clusters the data.Compared with the traditional K-means clustering analysis algorithm,the superiority of the algorithm is verified.This paper implements the following three model applications:The first is to establish multiple linear regression model.According to the epidemic data of Hubei Province published on the official website,the specific data from January 20,2020 to May 31,2020,was established to study the linear relationship between the cumulative confirmed data and other data through the establishment of multiple linear regression algorithm,especially through the established linear regression model to analyze the linear relationship between cumulative confirmed data and cumulative cure data,and analyze the causes.The second is to establish the attribute reduction algorithm model of rough set.A questionnaire survey was conducted among the students during the epidemic period,and the influencing factors were analyzed.In this paper,through the distribution of questionnaires,the formation of sample data,the establishment of rough set attribute reduction algorithm model,to analyze the influence factors of students' learning during the epidemic,this paper also uses the factor analysis algorithm for data comparative analysis,through the factor analysis algorithm to further verify the correctness of rough set attribute reduction algorithm.The third is to establish a rough set clustering synthesis model based on principal component analysis.During the epidemic period,the students were investigated by questionnaire,and the sample data were obtained to classify the students.In this paper,principal component analysis algorithm,rough set algorithm and K-means clustering algorithm are combined to propose a rough set clustering synthesis algorithm based on principal component analysis.Rough set clustering analysis is carried out on samples.Compared with the traditional K-means clustering analysis,the advantages of this algorithm are verified,and corresponding suggestions are put forward for different types of students.
Keywords/Search Tags:Cluster analysis, factor analysis, multiple linear regression, rough set
PDF Full Text Request
Related items