Font Size: a A A

Research And Application Of Optimization Algorithm Of WEB Data Mining Based On XML

Posted on:2016-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:L D ZhangFull Text:PDF
GTID:2308330473952383Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The Internet has become the effective way of modern people and generally necessary to obtain information, but in the vast sea of the Internet such as wave to extract the required information like look for a needle in the ocean, because the data size, variety, therefore, how to help people on the Internet is the valuable information has become the most meaningful research direction and hot topic. XML has become the standard of data conversion in mobile internet. In the mobile Internet has a lot of XML document management XML data emerge, how to effectively and timely and mining useful information, become the focus of attention of the mobile Internet industry.This paper briefly introduces the theory foundation for the construction of XML data storage and query system of WEB in data mining, namely XML technology, data mining algorithm. On this basis, this paper focuses on the analysis of the classic APRIORI algorithm, summarized the main disadvantage of this algorithm is proposed and the feasibility of the solution. One is to reduce the number of candidate itemsets computing support database tuples, improve APRIORI algorithm generates frequent itemsets efficiency; two is the use of compression set of rules, APRIORI association rules pruning strategy as well as the optimization of the generation method, the objective is to narrow the range of frequent itemsets to generate strong association rules is required to judge. Three is to accelerate the data between the query and storage efficiency. According to the characteristics of the path expression as the main body of the XML query, presents a method of storing XML documents in relational database, this method is based on the XPath data model, and the elements in the XML document Dietz coding to identify elements, at the same time in the database in the Dietz code to store each element and its parent element to maintain elements. The relationship between father and son for the relational data into XML documents or document fragments. Using this method, we developed a storage, conversion and query of three modules of middleware, which are used to store XML document elements, attributes and text.Finally, the APRIORI improved algorithm is applied to the "XML data storage and query system". The improved APRIORI algorithm improves the query speed, and the time complexity has obvious advantages. The experimental results show that, the improved APRIORI algorithm improves the quality of the strong association rules, reduces the computation time consumption, the improved APRIORI algorithm can more effectively improve the query and data storage effect..
Keywords/Search Tags:XML, Xpath, APRIORI algorithm, association rules, Dietz code
PDF Full Text Request
Related items