Font Size: a A A

Data Mining In Application And Research Of Publishing Industry

Posted on:2011-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:W H YuFull Text:PDF
GTID:2178360302499543Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the government informationizaton approaching, the integrated business system of publishing industry-the ISBN real-name application information system, also starts to implement all over the country aiming for united management. Some issues also involves, illegal publishing, uploading library resources overdue. Moreover the publishing regulatory agency has made some solutions, monitoring of library publishing and formulization, targeting the developmental trend accurately.This article focused on some common problems in publishing industry, and brings in data mining technology, theory study of in depth and specific practice on data mining has been conducted. The algorithm of clustering analysis and association rules is the main point, and revision to some specific problems in practice.First of all, to concentrate on classical K-Means algorithm, and together with its principle to analyze, that the clustering outcome refers to the initial clustering point. And it affects the clustering outcome greatly. For this issue we come out with a way to search the initial clustering point though distribution of data object, further more for assessing the clustering outcome accurately, a new way came out. Through the comparison of experiment analysis, the revised K-Means is much better than conventional, high efficiency and high stability. With the improved K-Means clustering analysis on the national book publishing information, we can find the suspicious publishers and the normal operation of the general standard.Secondly, for run-time consumption of classical Apriori algorithm, the revised Apriori algorithm which Hash technique based is welcomed. It greatly reduced the cost of overhead of the frequent invoking AprioriGen method to generate the candidate frequent itemsets. With the improved Apriori algorithm, we will discovery the association rules of published books as well as current trends in book publishing.This article also has implemented some of investigation, the pre-process before data mining and outcome assessment after data mining. Appropriate pre-process is able to improve the efficiency of algorithm. Plus it is effective for outcome assessment of data mining and outcome disclosure.The achievements of this research apply to domestic publishing industry, and it makes great contribution to it.
Keywords/Search Tags:data mining, data pre-processing, clustering, association rules, outcome evaluation
PDF Full Text Request
Related items