Research Of Web Data Mining Techniques Based On XML

Posted on:2010-11-20

Degree:Master

Type:Thesis

Country:China

Candidate:J H Liu

Full Text:PDF

GTID:2178360278481307

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In recent years, many traditional data mining researchers have been gradually shifting from the traditional areas of data mining to the field of Web mining. With the explosive growth of XML on Web, XML has become the de-facto standard of data exchange and data representation on Internet, and XML will replace HTML as the main data format on Web in the future, so, data Mining based on the XML methods have become a research hotspot of the field of Web data mining and XML technology.Web-oriented data mining technology is different from traditional database-oriented mining method. Generally, we need change HTML data to XML format in the process of Web Mining based on XML, and then mine. At present, the XML data was described by semi-structured data model for discovering frequent pattern in the majority of XML-based data mining algorithms, but there are some defects described XML using semi-structured data model, thus affecting the performance of the mining algorithm. In response to these problems, we make the following thesis works.Firstly, described a frequent patterns framework of Web mining based on XML. It classifies the frequent pattern mining algorithm according to the original mining algorithm of semi-structured data model and the XML data model characteristics; it summarizes the original algorithms of XML data mining in accordance with the emergence of the way, forms of organization and storage structure of the semi-structured data.Secondly, analyzing some defects described XML data using semi-structured data model, in view of these defects, research a kind of XML-oriented extensible markup tree model (ETM) as the data model of XML mining.Finally, an algorithm named XMLFPTMiner to mine frequent patterns in XML is produced based on ETM ordered tree, and a pruning method is produced to improve the algorithm. The pruning method can permits us to directly get some undiscovered frequent patterns from some discovered frequent patterns, so that deceases quantity of candidate subtrees and time that used to count the frequency of their, thereby improves the efficiency of XMLFPTMiner algorithm.

Keywords/Search Tags:

Web data mining, XML, frequent patterns, semi-structured data model

PDF Full Text Request

Related items

1	Research On Related Technology Of Frequent Pattern Mining For Semi-structured Data
2	Research On A Semi-structured Data Model Based Frequent Patterns Mining
3	Research On The Data Model And The Approaches To Data Mining In The Semi-structured Data
4	Research Of Web Data Mining Techniques Based On XML
5	Study On Semi-structured Data Mining
6	Study On Frequent Pattern Mining Algorithms And Pruning Strategies
7	Study Of Mining Data Streams Based On Semi-Structured Data
8	A Real-time Frequent Pattern Mining Algorithm For Semi Structured Data Streams
9	Research On Frequent Pattern Mining In XML
10	The Techniques Research On Frequent Pattern Mining