Font Size: a A A

Scalable Data Processing Based On Hadoop Platform And Complex Network Interest Mining

Posted on:2017-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2348330503492906Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With data showing massive growth, data redundancy problem is more and more serious because of the increase of data size and the dimension of data, at the same time the differences were showing because of users' demand, it is also shows that demands that have obvious difference between different departments and researchers, there is only one mode to define data storage and a few specific scheme for data processing for now, which cause a consequence that it's difficult to provide a scalable and general data processing mode, at the mealtime it cannot meet the users' need in an appropriate way. With the increase of the amount of data and the real value behind the data mining has become imminent, but human behavior is not a pure random origin, it has a variety of relevance and motivation, interest mining also has a lot of problems that the characteristics of user behavior cannot be fully displayer by general user interest mining technology that cannot map the correlation between behavior. Therefore, how to deal with the massive data efficiently and expand the user's interest is an urgent problem to be solved.In this paper, we construct a scalable data processing strategy, and merge it into the framework of Map Reduce computing, and build a scalable data processing model based on Hadoop. The scalable data processing strategy includes the scalable data storage structure, the design of plug-in data processing strategy and the design of Map Reduce framework fusion, and using examples to verify by the PC behavior of the users' data. In order to reduce data redundancy data storage structure was designed by using data structure level to classify and store the data. The data processing structure provide operation to add, delete, change the data at any time according to user needs without change the program to meet user needs. In addition, the model also uses a custom processing method to ensure the integrity of data processing. After verification, scalable data processing strategy does not reduce the efficiency of distributed data processing, but also be able to achieve any extension, delete user demand for hardware data processing, to meet user changeable data processing needs.From the scalable distributed massive data processing pattern preprocessing of structured data to extract user PC software data, this paper combine the complex network and behavior dynamics together to mine the relationship between user behavior characteristics and behavior, in order to build to weighted complex network model, the network model topology mapping user behavior characteristics. At the same time, the network model of the important nodes and the network community mining algorithm to show the greatest degree of network model based on the characteristics of mining user interest set. Experimental results show that the proposed complex network model based on mining user interest algorithm can accurately express the user interest, and at the same time in precision and recall rate have improved than other algorithms.
Keywords/Search Tags:Massive data processing, Scalable model, Complex network, user interest
PDF Full Text Request
Related items