Font Size: a A A

Research On Web Log Mining Technology Based On XML And Association Rules

Posted on:2004-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:X J JiangFull Text:PDF
GTID:2168360092493504Subject:Computer applications
Abstract/Summary:PDF Full Text Request
Web log mining is a new technology of network information processing, and is also an important application of data mining in internet area .With the rapid development of internet, the application of web log mining in e-commerce and personalized web is increasing speedily . Mining and analyzing web log file can better the structure of web, monitor web server's work status,, improve the design of web application system , and provide personalized server . At present, study about web log mining has three trends: (l)analyzing the design of the system; (2)modifying the design of the system(3)figuring out the intention of user . While dealing with log data, most research approaches store log data in database immediately only by simply data processing or convert log data to mole matrix , few of them aim at the store status of log data in terms of the feature of log data. In addition , most of existing mining frequent access paths algorithms utilize directly the algorithm Apriori of mining association rules between frequent sets , seldom or never taking the feature of access paths into account to improving the algorithm for the better result and efficiency .The paper is about how to analyze web server's log file and the technology used and tries to improve it in the following aspects: (l)This paper presents a simple processing model on mining web log based on XML storage and the corresponding solutions used to clean and transform log data . (2)According to the advantages of XML and the self-structure feature of log data, the paper proposes the novel idea that stores log data in XML form , furthermore it discusses the method and implementation on how to store XML-compliant log data into the database by the medium-grained storage means . (3)This paper addresses an improved algorithm called UFAPA for mining user frequent access path on the basis of the algorithm Apriori . The idea of the performance is proposed . Theory shows the effectiveness of the algorithm . Yields of large amount of candidate path sets could be avoided and the scan times to database are remarkably reduced .
Keywords/Search Tags:web log mining, log file, XML, XML database, association rule, frequent access path
PDF Full Text Request
Related items