Font Size: a A A

Web Log Data Mining Research And Implementation

Posted on:2011-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:W H FengFull Text:PDF
GTID:2208330332477279Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Along with continuous maturity of the information technology and computer network technology , although database technology have increasingly turned mature and been widely used ,it is only a basic method of information storage and management. A great amount of valuable data is subsurged in the sea of data and can't be used.Data mining emerge as a burgeoning research domain.Data mining technology is called one of the major technology of information processing in the future . According to data from different data source in the web,we can classify the web mining into web structure mining, web access information mining, web content mining.Web log mining process can be generally divided into three stages: log data preprocessing,pattern recognition analisis and the implementation of algorithm for mining. XML is a Extensive Markup language, it define data structure with a open and self-description mode, it can clearly indate data structure, so that it can reflect the relationship between data.XML provides a description of the unified structure for heterogeneous data sources. XML provide a method of communicating between different data. ID3 algorithm is a often used classical algorithm in data mining technique,which is mainly applied to the impletation of data mining.It always creates the smallest tree structure and is proved the system this article desigh has good effect to transaction analysis of log files by proofing instances, this system is effective in analyzing the log files.The main innovation of this system is adopting xml rules file technology, preprocess log data and match the rules, which has a great deal of flexibility.It is represnted by the user can configu XML log rules according to the users' preference rules and generate log rules sutiable for its own characteristics so that it generate log files which the users are interested in.We can create six rules to different log files in this design system: the rules of the universial logs,the rules of syslog log file,the rules of Weblogic, the rules of log4j-xml, and the rules of jboss.This system can create a corresponding regulations rules according to the different rules. The rules files exit with in XML format.It could deal with log files according to relevant rules. The innovation of the system is adopting log mining algorithm combined the xml technology,the self-description and the structure-description function using xml technology and universial describtion mode of data, so that it realize analysis and statistics of log files.
Keywords/Search Tags:Data Mining, Log Mining, ID3 algorithm, Web log analysis system, XML language
PDF Full Text Request
Related items