Research On DBSCAN Algorithm And Its Application In Patent Text Recommendation System

Posted on:2017-04-21

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Gao

Full Text:PDF

GTID:2428330596457449

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Dividing data objects into subsets is the main process of clustering analysis,it is the core of data mining technology,and its application range is very wide.Among them,density-based clustering algorithm has high application value.And it can also deal the data set with noise well.But it is not suitable for the data set with uneven distribution,and the time performance is also poor.Therefore,how to improve the clustering quality in the data sets with uneven density and its time performance is a major problem that needs to be solved at present.DBSCAN algorithm is a classical density-based clustering algorithm.In this paper,we focus on improving the clustering quality in non-uniform datasets and the high time complexity of DBSCAN algorithm,and we propose a new density clustering algorithm,named PC-DBSCAN,based on data partitioning and grid clustering.Firstly,according to the k nearest neighbor of each data point,we calculate the k-ave value corresponding to the data point,and then partition the data according to the k-ave value,which alleviates the sensitivity of the algorithm to global parameters Eps and improves the clustering quality of the algorithm;Secondly,for each data partition,the corresponding data space is divided into disjoint grid cells based on Eps.when to determine the core objects,only need to traverse the grid cells where the data point located in and the grid cells around,and time complexity is reduced,Finally,merge the clustering results of each data partition.The experimental results show that the improved algorithm has better clustering quality and clustering efficiency than the original DBSCAN algorithm and the related improved algorithm on the non-uniform data set.In this paper,we apply the improved clustering algorithm PC-DBSCAN into the problem-oriented patent text clustering,because the dimensions of problem-oriented patent text vectors are generally high,which leads to the failure to obtain good clustering results,Latent semantic analysis LSA can effectively reduce dimensionality in vector space,so we studied the latent semantic analysis LSA and applied it into problem-oriented patent text clustering process,The experimental results show that the clustering results are well.Based on the clustering results of patent text,this paper studies the problem-oriented patent text recommendation process based on text similarity,and implements a patent recommendation system,it recommends patents based on patent issues for users efficiently and accurately,and help users find an effective solution for a practical problem,the system achieves the desired results.

Keywords/Search Tags:

Dbscan, Grid clustering, Data partition, Patent problem, Text clustering

PDF Full Text Request

Related items

1	The Research On Web Text Clustering Based On DBSCAN Optimized Algorithm
2	Research On Adaptive Clustering Algorithm Based On DBSCAN Theory
3	Chinese Patent Clustering Analysis For Technical Problems
4	Research On Data Stram Clustering Algorithm Based On Similarity And Grid Partition Optimization
5	Research On Text Clustering Algorithm Based On DBSCAN
6	The Research Of Grid-based Parallel Clustering Algorithm And Clustering For Data Stream
7	Research On Patent Text Clustering Based On Improved K-means Algorithm
8	The Research On Non-Spherical Clustering Algorithm Based On Grid Partition
9	Research On Partition-Based Efficient Clustering Algorithm For Large-Scale Data
10	A Patent Clustering Method Based On The Characteristics Of Multiple Problems