Font Size: a A A

Study On Processing Uncertain Data In Deep Web

Posted on:2009-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:C GaoFull Text:PDF
GTID:2178360308977812Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays, with the development of Web technology and the rapid growth of Web databases, accessing web database is the main method of obtaining information. The researches on Deep Web increasingly attract people's attention. Deep Web databases contain more abundant and more professional information (mainly focus on domain). Data integration is used to satisfy users'require of obtaining information rapidly and correctly.There are many situations involving uncertainty in the processes of data integration such as attribute values mapping, schema matching, keywords query and so on. Firstly, uncertain data could be produced in the process of information extraction from the text or semi-structured data sources. Secondly, the mapping relations are uncertain when the mediate module matching with the data sources. Thirdly, the relationship between the keywords and the structured query content is also uncertain.A module processing uncertain data in Deep Web to solve the problem is proposed in this thesis. First of all, different situations producing uncertain data and similarity measures based on matching or semantic are analysed. The existential probability is computed by reasonable similarity measures. In addition, the module adopts data mining methods to mine the information which the users interested in. Associated rules mining has been an important subject among data mining. Generally, most studies focus on improving algorithmic efficiency for finding frequent patterns from traditional precise database.The traditional data mining algorithms Apriori and FP-growth are improved in the thesis. UD-Apriori algorithm uses an iterative calculation method layer by layer. The k-item set is used for computing (k+l)-item set. The property called anti-monotone compresses time and reduces space complexity. UD-FP-growth algorithm compresses the entire database to a tree-structure called UD-FP-tree. Frequent pattern mining process will be transformed into producing and mining subtree recursively.The two algorithms mine frequent patterns efficiently and discovery the association rules in transaction database containing uncertain data. The results could supply the losing information in professional database and provide more useful information to users.
Keywords/Search Tags:Deep Web, uncertain data, similarity measures, frequent patterns, UD-Apriori, UD-FP-growth
PDF Full Text Request
Related items