Font Size: a A A

Research Of Web Data Mining Techniques Based On XML

Posted on:2010-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:J H LiuFull Text:PDF
GTID:2178360278481307Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, many traditional data mining researchers have been gradually shifting from the traditional areas of data mining to the field of Web mining. With the explosive growth of XML on Web, XML has become the de-facto standard of data exchange and data representation on Internet, and XML will replace HTML as the main data format on Web in the future, so, data Mining based on the XML methods have become a research hotspot of the field of Web data mining and XML technology.Web-oriented data mining technology is different from traditional database-oriented mining method. Generally, we need change HTML data to XML format in the process of Web Mining based on XML, and then mine. At present, the XML data was described by semi-structured data model for discovering frequent pattern in the majority of XML-based data mining algorithms, but there are some defects described XML using semi-structured data model, thus affecting the performance of the mining algorithm. In response to these problems, we make the following thesis works.Firstly, described a frequent patterns framework of Web mining based on XML. It classifies the frequent pattern mining algorithm according to the original mining algorithm of semi-structured data model and the XML data model characteristics; it summarizes the original algorithms of XML data mining in accordance with the emergence of the way, forms of organization and storage structure of the semi-structured data.Secondly, analyzing some defects described XML data using semi-structured data model, in view of these defects, research a kind of XML-oriented extensible markup tree model (ETM) as the data model of XML mining.Finally, an algorithm named XMLFPTMiner to mine frequent patterns in XML is produced based on ETM ordered tree, and a pruning method is produced to improve the algorithm. The pruning method can permits us to directly get some undiscovered frequent patterns from some discovered frequent patterns, so that deceases quantity of candidate subtrees and time that used to count the frequency of their, thereby improves the efficiency of XMLFPTMiner algorithm.
Keywords/Search Tags:Web data mining, XML, frequent patterns, semi-structured data model
PDF Full Text Request
Related items