| As novel coronavirus pneumonia continues to develop,more and more information can be used for statistical analysis.The trajectory information of confirmed cases is the main content of the text information that can be studied.It is the direction that we can carry out text mining and exploration.This paper takes Harbin as an example.Through statistical research method,the cluster analysis is carried out on the track of confirmed cases and asymptomatic infected persons in Harbin area as of June 2020.Through the research method of text clustering,it helps the region to help the region to track existing cases and suspected cases with overlapping tracks with newly diagnosed cases,The author tries to provide scientific methods for tracing the virus and quickly locking suspected cases.In this paper,the text clustering method based on vector space model(VSM)and k-means algorithm is adopted.In view of the problem that the dimension of feature vector space obtained after track segmentation is too large,the algorithm complexity is too high.In this paper,the feature vector is reduced by variance based feature selection method,Thus the algorithm complexity is reduced and the clustering effect of text is improved.In addition,the effect of Euclidean distance and cosine distance on the clustering of trajectory text is compared.According to the research results of clustering the case track text,it is shown that the clustering results of nearly 70% are interpretable.It is proved that kmeans clustering method of vector space model has certain practical and reference value for case track text clustering.In addition,the paper also uses the method of center of gravity trajectory analysis,analyzes the spatial track of epidemic situation in Harbin.The analysis results have some research value in tracking the development track of virus and early warning of timely protective measures in the virus diffusion area. |