Study On Processing Uncertain Data In Deep Web

Posted on:2009-12-08

Degree:Master

Type:Thesis

Country:China

Candidate:C Gao

Full Text:PDF

GTID:2178360308977812

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Nowadays, with the development of Web technology and the rapid growth of Web databases, accessing web database is the main method of obtaining information. The researches on Deep Web increasingly attract people's attention. Deep Web databases contain more abundant and more professional information (mainly focus on domain). Data integration is used to satisfy users'require of obtaining information rapidly and correctly.There are many situations involving uncertainty in the processes of data integration such as attribute values mapping, schema matching, keywords query and so on. Firstly, uncertain data could be produced in the process of information extraction from the text or semi-structured data sources. Secondly, the mapping relations are uncertain when the mediate module matching with the data sources. Thirdly, the relationship between the keywords and the structured query content is also uncertain.A module processing uncertain data in Deep Web to solve the problem is proposed in this thesis. First of all, different situations producing uncertain data and similarity measures based on matching or semantic are analysed. The existential probability is computed by reasonable similarity measures. In addition, the module adopts data mining methods to mine the information which the users interested in. Associated rules mining has been an important subject among data mining. Generally, most studies focus on improving algorithmic efficiency for finding frequent patterns from traditional precise database.The traditional data mining algorithms Apriori and FP-growth are improved in the thesis. UD-Apriori algorithm uses an iterative calculation method layer by layer. The k-item set is used for computing (k+l)-item set. The property called anti-monotone compresses time and reduces space complexity. UD-FP-growth algorithm compresses the entire database to a tree-structure called UD-FP-tree. Frequent pattern mining process will be transformed into producing and mining subtree recursively.The two algorithms mine frequent patterns efficiently and discovery the association rules in transaction database containing uncertain data. The results could supply the losing information in professional database and provide more useful information to users.

Keywords/Search Tags:

Deep Web, uncertain data, similarity measures, frequent patterns, UD-Apriori, UD-FP-growth

PDF Full Text Request

Related items

1	Frequent Patterns Mining For Uncertain Data Using Correlation Metric
2	Research On Correlative Algorithms Of Association Rule Mining
3	Research On Uncertain Frequent Graph Data Mining
4	Study Of Fast Algorithms For Frequent Itemset Mining From Uncertain Data
5	Research On The Algorithm Of Mining Frequent Itemsets From Uncertain Data Based On The Tree
6	Research On Frequent Pattern Mining Of Uncertain Data
7	Mining Probability Frequent Patterns To Recover Uncertain Rfid Data Stream
8	Mining Probability Frequent Patterns To Recover Uncertain RFID Data Stream
9	The Research On Frequent Sequential Pattern Mining Algorithms In Uncertain Databases
10	Research And Implementation On Algorithms Of Frequent Subgraph Patterns Mining Upon Uncertain Graph Data