| In epidemiology,the aggregation of the cases has some similarities,especially in time,space,age,gender and other attributes.The research on some similarities of the cases can find some characteristics of the infectious diseases in order to find the phenomena and the reason for the disease outbreak objectively.Clustering analysis is an important tool to classify the cases in the field of data mining,which main function is to divide unlabeled cases according to certain rules,so that the cases in the same class have some similarities and the cases not in the same class are different.Nowadays,researches on the early detection of the disease outbreak mainly based on statistical methods,and seldom consider other methods.According to the different properties of data type,application and the range of the scale,etc.,current methods for the early detection of the disease outbreak can be segmented into three categories.Scan statistic is the most popular spatiotemporal algorithm based on statistic methods,which uses lots of scanning windows to find the disease outbreak area efficiently.However,this method uses grouped data,and needs the accurate location of the administrative regions.On the one hand,the shape of the result mainly depending on the shape of the administrative boundaries,on the other hand,the shape of the scanning window is predefined,so it is difficult to find the arbitrary shaped clusters,clusters crossing administrative boundaries or within a small area.In this paper,we will discuss the advantages and disadvantages of each algorithm.When dealing with the individual point data,we introduced a clustering method based on density entropy and improved its distance calculated method to find the disease outbreak area.And a method based on weighted density centroid is proposed for the visualization of disease outbreaks.To demonstrate the effectiveness of our method,experiments were conducted on two infectious disease datasets of different types and regions.The main contributions of this paper are highlighted as follows:(1)Current methods for the detection of the disease outbreaks mainly use grouped data,and few methods use individual point data.Considering that the locations of the events have unbalanced density and events in the different administrative region have different background population,we introduced a clustering method based on density entropy and using weighted Euclidean distance to evaluate the similarities of the individual points.Firstly,the address of the events is converted into relative coordinates.Next,the clustering area is given by our method,and the experimental results of spatial and spatiotemporal are compared with the traditional method based on scan statistic.We proposed the value individual density to evaluate whether there are outbreaks within a small area.(2)The result of current methods is based on the administrative boundaries and expressed by graphics,which may be comprehended subjectively and difficult to show the spatiotemporal evolution pattern.Therefore,on the basis of the central model expression method,we proposed a spatiotemporal evolution pattern method based on the weighted background population centroid transferring curve.By calculating the centroid during each time,the timely centroid transferring curve is proposed.And we may describe the spatiotemporal evolution pattern of disease areas intuitively by calculating the centroid shifts and transfer orientation.Above all,the method we proposed is to combine the weighted population and the clustering algorithm to solve the problem that there has different background population during the disease outbreak,the unbalance distribution of the disease outbreak problem is solved by the clustering algorithm based on density entropy.Experimental results show the proposed method can detect arbitrarily shaped clusters without the information of the administrative boundaries.And can detect clusters across the boundary of the administrative region or within a small area.Moreover,we intuitively visualize spatiotemporal evolution of the disease outbreaks by calculating the centroid transferring curve.The result is compared with the high-risk area centroid transferring curve by previous experiments based on the weighted background population.And these two curves are similar to each other,which means weighted population high-risk events area can be considered as infectious disease outbreak area. |