The Application Research Of Weblog Mining Based On Cloud Computing

Posted on:2012-05-04

Degree:Master

Type:Thesis

Country:China

Candidate:M Cheng

Full Text:PDF

GTID:2178330338992201

Subject:Business Intelligence

Abstract/Summary:

PDF Full Text Request

How to solve the problem of processing massive data in data-mining filed is always an important researching subject. Especially with the rapid development of network technology, the data on the web increase rapidly in the form of exponential and with many characteristics such as massive, diverse, heterogeneous and dynamic, this makes mining on a single node can not meet the need of current massive data analysis task. How to extract useful information from the world's largest data collection—web, has become a more concerned subject for scholars from all over the world.Cloud Computing is produced under the background of the situation mentioned above, its emergence gives a bright future for massive data processing and storage. The platform of Cloud Computing can run only to be deployed in an ordinary cluster of inexpensive computers, but the data processing capability is strong. Therefore, whether web data mining system run successful under the framework of Cloud's cluster or not, has an important significance and application value.Based on the Hadoop platform, combined with the characteristic of web log mining, we present a solution of web log mining system which based on Cloud Computing, and describe each module of the system in details. Meanwhile, the current mining algorithms are focus on users'browsing frequency, neglect an important problem of whether users are interested in the frequent path or not. Due to this problem, combined with web topology structure, revise the measures of users'preferred browsing paths which based on browsing frequency, and present a concept of useful preference and a method of mining user preferred browsing path, remove the bad impact of mining due to pages'place and links.Finally, we make experiments to verify the effectiveness of the improved algorithm and the efficient of Cloud Computing. The result shows, the improved algorithm can dig out preferred browsing path which reflecting the users'preference more accurately. Meanwhile, according to using rich resource in Cloud to accomplish mining task can reach to a higher efficiency than which in a single node environment, both in data processing and task execution.

Keywords/Search Tags:

Cloud Computing, Web log mining, Hadoop, Preferred browsing path

PDF Full Text Request

Related items

1	Research On Technology Of Mining User Browsing Paths Based On Hadoop
2	Research Of Users Browsing Behavior Based On Path And Web Mining
3	Research Of User Preferred Browsing Paths Based On Web Log
4	Based On Web-log Frequent Browsing Paths Mining And Technology Analysis
5	Research On Web Data Mining Algorithms In Cloud Computing Environment
6	The Reseach Of Data Mining Based On HADOOP
7	Study On Data Mining Platform Based On Cloud Computing
8	Web Structure Mining Algorithm Based On Cloud Computing Environment Is Studied
9	The Process And Research Of Massive Data Mining Based On Cloud Computing
10	Research On Cloud Computing Of BI Processing Technology