Font Size: a A A

Research On Web Text Mining Based On XML And Association Rule Mining Algorithm

Posted on:2012-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2178330338994854Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, with the development of computer technology and the popularity of the Internet, the data quantity in all levels of website server is getting more and more huge, the data type is also getting more and more numerous and diverse, how to use these data more effectively and dig out valuable information in all areas now become a hotspot research.Although traditional database technology and data mining technology has acquired rapid development and also consummates day by day, but because the data type of Web data is semi-structured or unstructured, traditional technology have many difficulties in mining information of Web data. XML is a semi-structured data model, with the continuous development of XML, more and more Internet information are indicated by using XML. XML have the Characteristics of extendibility, platform independency, flexibility and so on, also has strong data expression skills, which make XML have stronger role in representing and exchanging information day after day. Therefore, regarding the huge quantity of XML data, how to effectively extract valuable information is imminent.The Apriori algorithm is a classical algorithm for mining association rules and has great influence in association rules domain, however, as a result of its need to scan database frequently and the large space consumption, many people have made the improvement with it through many kinds of methods. Existing Apriori algorithms realized by the XQuery language still have the place needs to be improved, for example, in certain circumstances, because of the XML documents'large data quantity, the related data is stored in many documents which have no inevitable relation. But the present association rule mining algorithms are mainly mining the single XML document, the algorithms must be improved if they mining several documents.This article unifies XQuery which is XML's query language and the association rule mining algorithm to realize the Apriori algorithm based on XQuery as to study mining association rules of several XML documents. It makes the improvement to the algorithm through introducing the collection which belongs to the XQuery language and has the characteristics of accessing sereral XML documents, which realizes the aim of mining several XML documents on the premise without reducing the efficiency of mining. The improved algorithms will be used in Web text mining model based on XML and its feasibility and validity will be verified.
Keywords/Search Tags:XQuery, Apriori, XML documents, association rules, data mining
PDF Full Text Request
Related items