Font Size: a A A

Research On DBSCAN Algorithm And Its Application In Patent Text Recommendation System

Posted on:2017-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q GaoFull Text:PDF
GTID:2428330596457449Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Dividing data objects into subsets is the main process of clustering analysis,it is the core of data mining technology,and its application range is very wide.Among them,density-based clustering algorithm has high application value.And it can also deal the data set with noise well.But it is not suitable for the data set with uneven distribution,and the time performance is also poor.Therefore,how to improve the clustering quality in the data sets with uneven density and its time performance is a major problem that needs to be solved at present.DBSCAN algorithm is a classical density-based clustering algorithm.In this paper,we focus on improving the clustering quality in non-uniform datasets and the high time complexity of DBSCAN algorithm,and we propose a new density clustering algorithm,named PC-DBSCAN,based on data partitioning and grid clustering.Firstly,according to the k nearest neighbor of each data point,we calculate the k-ave value corresponding to the data point,and then partition the data according to the k-ave value,which alleviates the sensitivity of the algorithm to global parameters Eps and improves the clustering quality of the algorithm;Secondly,for each data partition,the corresponding data space is divided into disjoint grid cells based on Eps.when to determine the core objects,only need to traverse the grid cells where the data point located in and the grid cells around,and time complexity is reduced,Finally,merge the clustering results of each data partition.The experimental results show that the improved algorithm has better clustering quality and clustering efficiency than the original DBSCAN algorithm and the related improved algorithm on the non-uniform data set.In this paper,we apply the improved clustering algorithm PC-DBSCAN into the problem-oriented patent text clustering,because the dimensions of problem-oriented patent text vectors are generally high,which leads to the failure to obtain good clustering results,Latent semantic analysis LSA can effectively reduce dimensionality in vector space,so we studied the latent semantic analysis LSA and applied it into problem-oriented patent text clustering process,The experimental results show that the clustering results are well.Based on the clustering results of patent text,this paper studies the problem-oriented patent text recommendation process based on text similarity,and implements a patent recommendation system,it recommends patents based on patent issues for users efficiently and accurately,and help users find an effective solution for a practical problem,the system achieves the desired results.
Keywords/Search Tags:Dbscan, Grid clustering, Data partition, Patent problem, Text clustering
PDF Full Text Request
Related items