Font Size: a A A

The Application Research Of Weblog Mining Based On Cloud Computing

Posted on:2012-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:M ChengFull Text:PDF
GTID:2178330338992201Subject:Business Intelligence
Abstract/Summary:PDF Full Text Request
How to solve the problem of processing massive data in data-mining filed is always an important researching subject. Especially with the rapid development of network technology, the data on the web increase rapidly in the form of exponential and with many characteristics such as massive, diverse, heterogeneous and dynamic, this makes mining on a single node can not meet the need of current massive data analysis task. How to extract useful information from the world's largest data collection—web, has become a more concerned subject for scholars from all over the world.Cloud Computing is produced under the background of the situation mentioned above, its emergence gives a bright future for massive data processing and storage. The platform of Cloud Computing can run only to be deployed in an ordinary cluster of inexpensive computers, but the data processing capability is strong. Therefore, whether web data mining system run successful under the framework of Cloud's cluster or not, has an important significance and application value.Based on the Hadoop platform, combined with the characteristic of web log mining, we present a solution of web log mining system which based on Cloud Computing, and describe each module of the system in details. Meanwhile, the current mining algorithms are focus on users'browsing frequency, neglect an important problem of whether users are interested in the frequent path or not. Due to this problem, combined with web topology structure, revise the measures of users'preferred browsing paths which based on browsing frequency, and present a concept of useful preference and a method of mining user preferred browsing path, remove the bad impact of mining due to pages'place and links.Finally, we make experiments to verify the effectiveness of the improved algorithm and the efficient of Cloud Computing. The result shows, the improved algorithm can dig out preferred browsing path which reflecting the users'preference more accurately. Meanwhile, according to using rich resource in Cloud to accomplish mining task can reach to a higher efficiency than which in a single node environment, both in data processing and task execution.
Keywords/Search Tags:Cloud Computing, Web log mining, Hadoop, Preferred browsing path
PDF Full Text Request
Related items