Font Size: a A A

Research And Design Of Planning Of Topics Based On Data Mining

Posted on:2016-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:W X LiFull Text:PDF
GTID:2298330467493345Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
At first, this paper analyzes the present situation of publishing industry. The publishing agency, editing staff publishing subject selection planning has lagged behind, lack of understanding on the book publishing market and the needs of the user. The development of publishing industry encountered consumers drop in purchase demand, the book inventory overstocked, economic benefits reduced. According to the background and the research present situation of domestic and foreign research, combined with the industry in the forefront of the related technology, this topic proposed based on data mining topics planning scheme.Based on the analysis of the several domestic books electronic commerce website, and activity degree and information integrity of the website, adopted the reasonable data extraction method, and selected the market information of the JD book market as the basis of the research.In order to solve the problem that the large relevant information data for books. This paper studies the current data processing scheme, using the hadoop ecosystem as cloud storage to solve the massive information storage in the book market; in order to manage book market information, using hive data warehouse technology to realize the simple and efficient distributed data management. In the end, I use the data statistics to get data simple analysis; and use analysis of the depth of excavation and the classification algorithm to get depth of data analysis. This application combined of these techniques so that to achieve the purpose of improving the efficiency of topics and planning.The purpose of this study, is make full use of the limited book market information, improve the ability of publishing institutions planning topics. The difficulty and innovation of this paper are as follows:analysis of the book market information resources, design an scheme of easy expansion, high fault tolerance, support cloud storage for massive data sets; and use of Hive technology, provides a feasible scheme for the data warehouse; by using the simple analysis, and classification algorithm, design a data mining technology.The main result of this paper is using the hadoop distributed file system to achieve massive book market information storage; using the technology of Hive data warehouse, realize a distributed data management; and use the method with simple analysis and classification algorithm, achieve selection of information data mining.
Keywords/Search Tags:data mining, topic planning, mass storage, Hadoop, Hive
PDF Full Text Request
Related items