Font Size: a A A

Research On Methods Of Semi-structured Data Implication Rules Extraction

Posted on:2019-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:L YangFull Text:PDF
GTID:2428330545958785Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Semi-structured data is relatively structured data,accompanied by a new type of data generated by Internet applications,which exists in a wide range of major social networking platforms and e-commerce platform.In the big data environment,the data size,growth speed and breadth of semi-structured data are far beyond the structured data,showing a rapid growth and development momentum.Implication is not only a description of the implication of the relationship between the object to explore the form of implication to describe the implication of knowledge representation,but also the main reasoning of classical logic and approximate reasoning.The semi-structured data in the Internet business,consumer data extraction rules implied,can provide a reference for business,business and consumer analysis and decision-making.Therefore,the research on semi-structured data implication rule extraction has theoretical significance and practical application prospect.Aiming at the problem of semi-structured static data implication rule extraction,two methods are adopted to extract the rules.The first method is to convert the semi-structured data into structured data,and the data transformation method and genetic algorithm are used to extract the semi-structured static data implication rule.The semi-structured static data implication rule extraction algorithm SDIR.The network crawler is used to crawl the data of the public comment network and complete the transformation of semi-structured static data to structured data and data preprocessing to extract the implication rules of the data.Experiments verify the effectiveness of the algorithm.The second method is based on the XQuery query language to directly implicate the semi-structured data in web pages.According to the SDST concept,an improved Apriori algorithm based on the XQuery query language is proposed to implement semi-structured data with complex and irregular web pages.The implication rule is extracted.The simulation experiments verify the effectiveness of the algorithm by testing on the simulated trading data set.In order to solve the implicit rule extraction problem of semi-structured dynamic data,the implication intensity vector metric,the support vector and the confidence vector are introduced to dynamically change the implication rules over time.A parallel semi-structured data implication rule based on partitioning is proposed.The Hadoop parallel computing environment is set up by three computers.The parallel computing MapReduce function is designed.The algorithm is implemented on the Hadoop platform using MapReduce to improve the running efficiency of the algorithm.The validity of the algorithm is verified by experiments.The semi-structured data implication rule extraction method was applied to Taobao customer transaction data analysis,data from Xiamen University database laboratory development team to crawl Taobao in June to November 2015 customer transaction data.Firstly,the data is preprocessed,and then the association rules and implication rules of the data are extracted.The association rules are extracted in order to obtain the frequent item sets,so that the extracted implication rules have a greater scope of application.Finally,the association rules and implication rules of the extracted data are analyzed to provide reference for business decision-making.
Keywords/Search Tags:semi-structured data, implication rules, rule extraction, parallel computing, data analysis
PDF Full Text Request
Related items