Font Size: a A A

Research Of Weblog Mining And Design And Realization Of LsMiner MiningSystem

Posted on:2005-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:P CengFull Text:PDF
GTID:2168360152455521Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Making use of DataMining ,Web mining discover information from web documents and service.With the rapid growth of Internet, Web have been become a large information resource. But the conflict between the limited human atention and the unlmited information is notable. Web Mining include Web Content Mining,Web Structure Mining and Web Usage Miningse.As an important technique among numerous data mining methods,Web Log Mining takes on particular academic and applied significance. Web Usage Mining is a useful method to find user preference and behavior character from Web navigation information. It is important for Web site management and Web users atraction, etc.Datapreprocessing plays an essential role in the process of Web log mining.In this thesis,the key technology about datapreprocessing is studied and discussed and a model of datapreprocessing is brought forward.This model of datapreprocessing adapts to Web Log Mining.Mining of association rules is an important technology of Web Log Mining,and it discover the connotative relation between the records of web log.The procedure of creating association rules is matching rules that meet the support and confidence in Large Frequent Itemset.The algorithmApriori may generate large number of subset. Mining maximum frequent itemsets is a key problem in many data mining application. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. Han brought forward algorithm FP_tree. The algorithm makes use of previous mining result to cut down the cost of finding new maximum frequent itemsets.The aim of sequential patern mining is to find the maximal sequences in sequence set with a userspecified minimum support. And then each maximal sequence represents a sequential patern.There are mainly two kinds of sequential patern mining methods: one is similar with Apriori algorithm, and GSP is representative. Such algorithms are based on the fact that a sequence is frequent only if all its subsequences are frequent. Another methodology of sequential patern mining is the application of sequential pattern growth technique based on the database projection. Among them, PrefixSpan is very eficient because of its centralized search.In this thesis, corresponding solutions to these algorithms are offered and realized in the web log mining system,lsMiner.
Keywords/Search Tags:Web Log Mining, association rule, sequential patern, patern analysis
PDF Full Text Request
Related items