Research Of Weblog Mining And Design And Realization Of LsMiner MiningSystem

Posted on:2005-03-11

Degree:Master

Type:Thesis

Country:China

Candidate:P Ceng

Full Text:PDF

GTID:2168360152455521

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Making use of DataMining ,Web mining discover information from web documents and service.With the rapid growth of Internet, Web have been become a large information resource. But the conflict between the limited human atention and the unlmited information is notable. Web Mining include Web Content Mining,Web Structure Mining and Web Usage Miningse.As an important technique among numerous data mining methods,Web Log Mining takes on particular academic and applied significance. Web Usage Mining is a useful method to find user preference and behavior character from Web navigation information. It is important for Web site management and Web users atraction, etc.Datapreprocessing plays an essential role in the process of Web log mining.In this thesis,the key technology about datapreprocessing is studied and discussed and a model of datapreprocessing is brought forward.This model of datapreprocessing adapts to Web Log Mining.Mining of association rules is an important technology of Web Log Mining,and it discover the connotative relation between the records of web log.The procedure of creating association rules is matching rules that meet the support and confidence in Large Frequent Itemset.The algorithmApriori may generate large number of subset. Mining maximum frequent itemsets is a key problem in many data mining application. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. Han brought forward algorithm FP_tree. The algorithm makes use of previous mining result to cut down the cost of finding new maximum frequent itemsets.The aim of sequential patern mining is to find the maximal sequences in sequence set with a userspecified minimum support. And then each maximal sequence represents a sequential patern.There are mainly two kinds of sequential patern mining methods: one is similar with Apriori algorithm, and GSP is representative. Such algorithms are based on the fact that a sequence is frequent only if all its subsequences are frequent. Another methodology of sequential patern mining is the application of sequential pattern growth technique based on the database projection. Among them, PrefixSpan is very eficient because of its centralized search.In this thesis, corresponding solutions to these algorithms are offered and realized in the web log mining system,lsMiner.

Keywords/Search Tags:

Web Log Mining, association rule, sequential patern, patern analysis

PDF Full Text Request

Related items

1	Data Mining Based On Web Log
2	Based J2ee Guangzhou Guaranteeing The Oa System
3	Research On Mobile Customer Churn Based On Data Mining
4	Research On The Integrated Association Rule Mining System
5	Research Of Misuse IDS Based On Sequential Pattern Mining And Key Technologies
6	Sequential Patterns Mining And Application In Web Log
7	The Sequence Association Rules Mining For The E-commerce Personalized Recommendation
8	Research On Mining Algorithm Of Association Rule And Its Application For Biological Data
9	Research Of Web Data Mining For Electronic Commerce
10	Association Rule Mining Expansion Of Research In The Area Of disaggregated Data