Crowd portraits are information labels of crowds formed by analyzing people’s daily behavior patterns and other attributes,which can provide data support for public transportation scheduling optimization and commercial location selection.However,there are still two problems in the current study of crowd portraits: first,Through the crowd portraits from massive public transportation data,only the places frequented by passengers can be known,but the characteristics of their travel trajectories can’t be explained;second,the massive public transport data is extremely large and contains many information about areas with less crowds,and direct calculation is timeconsuming,so key areas with more crowds is the focus of research.In order to solve above problems,this article did the following work relying on Singapore’s total of about 30 million bus data and 40,000 Point of Interest(POI)data for a week:1)Identify urban functional areas.Based on Singapore’s Point of Interest(POI)data,this paper reclassifies the data into 15 categories of functionality and conducts kernel density analysis to understand the distribution of each functionality.Then divide the geographical area of Singapore according to the area of 1km×1km,and identify the functionality of each area according to the different time periods of the day according and the distribution of POI in each area;2)Trajectory textualization.This paper firstly screens the crowds in key areas based on the PageRank algorithm,and extracts passenger trajectory data with frequent trips and frequent visits to key areas through this method.After dividing the filtered passenger trajectory data according to age group and time which contains sequential workdays and weekends within,the trajectory data of each passenger is connected in series to form a complete trajectory data set.Then,by fusing the divided trajectory data with the urban functional areas,the trajectory of each passenger is displayed in a textual form.Finally,the TF-IDF(Term Frequency–Inverse Document Frequency)algorithm is used for the textual trajectory of each passenger,to obtain the functional data of the area frequently visited by each passenger,that is,the characteristics of the travel trajectory;3)Portray crowd portraits.This paper uses multiple clustering algorithms to classify the functional data of the areas frequented by each passenger.By comparing with real data,the clustering algorithm with the best effect is used as a method to portray crowd portraits.Using Flow Map to visually display and describe the trajectories of the crowd portraits of adults and the elderly,and the results conform to the behavior characteristics of the crowd in the real world.The results show that: through the screening of people in key areas based on the PageRank algorithm,a total of 3 million passenger trajectory data were screened,which greatly reduced the amount of data and improved the efficiency of data processing;combine with trajectory data and urban functional areas to form a textual trajectory,which can obtain easy-tointerpret passenger travel characteristics;by using the K-means algorithm based on cosine distance,the crowd portrait can be better portrayed,whose accuracy is close to 80%. |