Research On Key Technology Of Scientific Literature Data Mining

Posted on:2016-04-05

Degree:Master

Type:Thesis

Country:China

Candidate:M Y Li

Full Text:PDF

GTID:2348330542473911

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid increase of the number of scientific literatures,the development and evolution of scientific knowledge become more and more quickly.It is very difficult for researchers to grasp and understand the informations quickly.Therefore,how to discover the literatures which have higher value of reading from a large amount of scientific literatures has attracted a lot of attention from more and more researchers.Citation count refers to the total number of citations which is obtained by a scientific literature in a specified period of time.Citation count is an important method to evaluate the influence and quality of scientific literatures.But it has many limitations to analysis the citation count,such as the current time point.Based on these circumstances,it is a challenging task to get the citation count in the future,which will has a bad effection on the assessment of secientific literatures’ contribution.In order to identify the potential literatures quickly and promote the dissenmination of new knowledge,a method which can predict the citation count automatically and exactly is needed.This paper focus on the algorithm which is used to prediction citation count of scientific literatures.The research details of this paper are as follow: Firstly,we present a improved algortihem for the citation count prediction task in the international top competition on data mining which is named by KDDCUP.Compared with the algorithm of the team in the first place,we analysis the topic words of literatures in the dataset.Then we cluster the literatures according to their topic words,do regression forecast in each class in order to reduce the impact cause by the differences of each topic on academic activity.Experimental analysis shows that the improved algorithm can improve the prediction accuracy compared with the original algorithm.Based on our findings about the shortage of existing algorithms,this paper propose a new citation count time series predicting algorithm and evaluate it using the real citation data.This algorithm is based on the similarity of citation pattern,using time-series regression modeling and similarity clustering data mining technology.On one hand,our algorithm can analyze the citation count of each literature in the dataset automatically and get the averagecitation count in each month.On the other hand,we also mine the different citation patterns by similarity clustering,so we can predict the citation count based on the existing citation count time series.Analytical and simulation results show that our prediction algorithm can achieve higher accuracy.

Keywords/Search Tags:

citation count prediction, time series, cluster analysis, regression forecast

PDF Full Text Request

Related items

1	Research On Airport Noise Prediction With Time Series Analysis
2	A Study On Predicting Citation Count Of New Published Paper Based On GAT Model
3	Research On Citation Count Prediction Of Papers Based On Deep Learning
4	Dynamic Time Series Cycle Analysis And Forecasting Model
5	Study On Water Quality Time Series Data Mining And Application Integration
6	Research On Forecast Of Time Series Based On Svm
7	The Research Of Chaotic Time Series Prediction Method Based On The BP Neural Network
8	Prediction Analysis And Comparative Evaluation Based On Daily Gas Consumption Data Of Urban Residents
9	Time Series EMD Analysis And Prediction System Design And Implementation
10	Multi-factor Time Series Prediction Research Based On SVM And Its Parallelization