Font Size: a A A

Research Of Database Access Log Based On Weka

Posted on:2013-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:A L FanFull Text:PDF
GTID:2268330425982743Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the accelerated pace of social information, the database of human precipitation of large amounts of data, how to extract the implied, unknown and potentially useful information with data mining technology has become the research focus.This paper described Weka, which is an open data mining platform and collects lots of machine learning algorithms which are able to take data mining tasks. And did a detailed study of database access log data preprocessing, cluster analysis methods and carrying out the mining of University Library lending information Log. Package CV-k-means—k-means clustering algorithm based on coefficient of variation to Weka platform, then analyze the lending information of library by using the improved clustering algorithm and dig out the implicit knowledge in order to provide reference data for the library purchasing department. The main content and contribution of this paper id focused in the following aspects:1. The performance of k-means clustering algorithm depends on the selection of distance metrics. The Euclid distance is commonly chosen as the similarity measure in k-means clustering algorithm, which treats all features equally and dose not accurately reflect the dissimilarity among samples. k-means clustering algorithm based on coefficient of variation (CV-k-means) is proposed in this paper to solve this problem. The CV-k-means clustering algorithm uses variation coefficient weight vector to decrease the affects of irrelevant features. The experimental results show that the proposed algorithm can generate better clustering results than k-means algorithm.2. As an open data mining platform, Weka collects lots of machine learning algorithms which are able to take data mining tasks. However, the real world problem which to be solved become complicated and variety of data mining algorithms are showing limitations. In this paper, running Weka on the CV-k-means algorithm to get the new personalized data mining platform, in order to ease the contradiction between the general-purpose data mining tools and areas of expertise in mining.3. Analyze the lending information of Dalian Polytechnic University by using the improved personalized Weka data mining platform. Make CLC as a cluster object, three properties lend times, renew times and average lending time to participate in cluster computing to mining the pretreatment data, after analyze the clustering results we get the degree of reader interest, then provide appropriate recommendations for the procurement for the library.
Keywords/Search Tags:Data Mining, Cluster analysis, k-means, coefficient of variation, Weka
PDF Full Text Request
Related items