Font Size: a A A

Analysis And Mining Of Wiki Entry Editor Behavior Based On Hidden Markov Model

Posted on:2019-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:G Y HuangFull Text:PDF
GTID:2438330545956864Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Wikipedia is a kind of online encyclopedia that has grown in popularity in recent years.The wiki,a hypertext system,which is the basic technology of Wikipedia provides a collaborative editing environment for users around the world.Now in the era of information explosion,the data created by users in the internet is huge.The analysis of co-editing behavior in Wikipedia must not only be confined to the traditional static space,but also be considered from the real data of developing time series of Wikipedia items,and extract its internal change mechanism.The focus and innovation of this article include the following aspects:Firstly,a denoising and feature extracting method of Wikipedia items was proposed based on Tensor Decomposition.This method effectively solved the sparse problem of high-dimensional data,and deeply explored the internal mechanism of the items sequence.At the beginning,perform dimensionality reduction on the original third-order tensor,and extract the features that characterize its internal change mechanism as the similarity measurement,then perform the tensor decomposition.This method reduced the workload of clustering algorithm that included in follow up studies,and it solved high-dimensional data problems,more important,it does not destroy the internal relationship of the elements.Secondly,a new K-medoids clustering algorithm of Wikipedia items was proposed based on Dynamic Time Warping.This method form the data into time series with unequal length,by using the features of Tensor Decomposition when denoising and automatically clustering the original data.Duo to the Euclidean distance was used in measuring similarity,traditional K-medoids algorithm cannot stretch or displace features that has similar morphological.Instead,the method based Dynamic Time Warping that proposed in this paper can overcome the problem of scale displacement to a certain extent.Finally,a Wikipedia items user behavior mining algorithm based on Dynamic Time Warping and Hidden Markov Model was proposed.This algorithm can train models corresponding to different stages of Wikipedia items development.Then some contrastive experiments were designed under the real data,on the basis of that the improved algorithm and traditional algorithms were tested separately with or without Tensor Decomposition.Experimental show that the improved algorithm has higher improvement in processing speed and accuracy than traditional algorithms.
Keywords/Search Tags:Data Mining, Wikipedia entries, Tensor Factorization, Dynamic Time Warping, Hidden Markov Model
PDF Full Text Request
Related items