Font Size: a A A

Research And Implementation Of Scenic Area Information Mining System Based On Feature Weighting And Density Clustering

Posted on:2020-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:N YuanFull Text:PDF
GTID:2428330620462240Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of Internet technology and the population of mobile devices,personalized information services such as scenic spots and hotels based on the Internet are also increasing,and the importance of these information data is becoming increasingly prominent.Facing the huge information data accumulated day by day,how to discover deeper links and rules rather than superficial relations from these data resources has become a hot topic for scholars at home and abroad.As important methods in the field of data mining,spatial density clustering algorithm and feature weighting algorithm are widely used in data analysis and processing.In this thesis,based on the analysis of interest point information and text data of scenic,using spatial density clustering algorithm DBSCAN and feature weighting algorithm TFIDF to process datasets,a scenic spot information mining system with hot spot area discovery and feature keyword extraction function is designed and implemented.The main research work of this thesis is as follows:(1)Data preprocessing and data storage are carried out for interest points and text data sources of scenic spots.Remove the abnormal data existing in the data sources of interest point in scenic spots,and fill up the missing data;segment the text data source of scenic spots and remove the empty text data and meaningless stop words.Final,it is stored in the database according to the data attribute categories.(2)An improved spatial density clustering algorithm KM-DBSCAN is proposed.Because the final clustering effect of traditional DBSCAN algorithm depends heavily on the selection of its tow input parameters,and the global nature of the input parameters will also lead to poor clustering effect of algorithm for non-uniform density distribution datasets.In view of the above drawbacks,an improved spatial density clustering algorithm KM-DBSCAN with adaptive parameters is proposed.The improved algorithm divides the data by k-means,and introduces mean shift vector to get the Eps and MinPts values of each partition,then carries out local clustering for each partition and merges the clustering results.The improved algorithm ameliorates its dependence on parameters Eps and MinPts,and performs well on non-uniform density distribution datasets.The rationality and validity of the improved algorithm are verified by comparison experiments.(3)An improved feature weighting algorithm FDCD-TFID is proposed.TFIDF,a traditional feature weighting algorithm,does not consider the imbalance of distribution between datasets categories and cannot correctly reflect the distribution of text vectors between and within classes in the classification system,which results in the poor performance of TFIDF in dealing with skewed datasets of text classes.Therefore,in view of the above shortcomings,this paper improves it by introducing the word frequency distribution factor and the category distribution factor.The above distribution factor not only takes into account the distribution of datasets between categories,but also reflects the distribution of feature vectors between inter-class and intra-class texts.So they can accurately measure the importance of feature vectors in text sets,and the validity of the improved algorithm in dealing with class skewed datasets is verified by comparative experiments.(4)Designed and implemented a scenic spot information mining system based on feature weighting and density clustering.Combining the improved feature weighting algorithm FDCD-TFIDF and density clustering algorithm KM-DBSCAN with the actual system,it can not only display the characteristic keywords of scenic spots,but also provide the function of displaying the distribution of hot spots in scenic spots and retrieving facilities around scenic spots.The system can help users to better select personalized tourist attractions,more convenient and fast access to information around the scenic spots,improve efficiency,and provide reliable data support for travel planning.Through the system debugging and analysis results,it can be seen that the scenic spot information mining system designed and implemented meets the expected effect,and shows the reliability and practicability of the system,which is of great significance for the modernization of scenic spot service and the planning of scenic area.
Keywords/Search Tags:Data mining, Hot spot area, DBSCAN, Text categorization, Feature weighting
PDF Full Text Request
Related items