Font Size: a A A

Research And Application Of Clustering Algorithm Based On Reference Co-citation

Posted on:2018-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q WangFull Text:PDF
GTID:2348330518476617Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Recently,all kinds of literature works showed a rapid growth trend in number,how to find the important papers is a big problem,so the research and application of data mining technology in the field of science and technology literature has become an important direction of research.co-citation reflects the relationship between literature on content,co-citation combined with traditional text clustering algorithm to analyze the literatures,will help improve the accuracy of clustering,and help researchers improve retrieval efficiency.In this paper,we will make a comprehensive and in-depth study on co-citation,and get the co-citation matrix.In the document pretreatment process,improve the calculation method of combining literature between the co citation relationship of literature feature extraction and document similarity,finally the processed literature clustering analysis using K-Means clustering algorithm and spectral clustering algorithm,experimental verification of the improved algorithm improves the clustering accuracy,can help the user more effective retrieval and screening literatures.The main contents and results are as follows:1.In the selection of literature features,get basic feature words from the title,abstract and key words in the literature,combined with the co-citation frequency of information of feature words score are weighted,select the highest scoring lexical entry characteristics as a final;in the calculation of similarity between the literature,based on the vector cosine similarity calculation method the introduction of co-citation weighted degree,applies the improved similarity to the clustering algorithm.2.In cluster analysis,we using K-Means algorithm and N-Cut segmentation criterion spectral clustering algorithm to do the clustering analysis.Analyze many groups of comparison between the original algorithm and the improved algorithm,the F-Measure is selected as the evaluation index to prove the clustering results of improved algorithm.3.Based on the Visual Studio2012 and MFC development platform to designed and implemented a recommendation system,use the improved algorithm,realized the document retrieval and management,citation analysis,clustering analysis,and high quality papers recommendation function.Finally,the paper summarizes the full text,and puts forward some prospects for further research.
Keywords/Search Tags:co-citation, K-Means, spectral clustering, recommendation system
PDF Full Text Request
Related items