Key Category Mining Algorithm On Massive Datasets

Posted on:2011-12-31

Degree:Master

Type:Thesis

Country:China

Candidate:X F Xu

Full Text:PDF

GTID:2178360305997802

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the appearance of Internet, we have entered a new information age. The approaches to access information are not as few as before, and the content is not as limited as before. People can access various information via network, while more and more new information is generated from time to time. The rapidly increasing information brings convenience to people, however, it brings a series of problems to people at the same time. The huge amounts of information make some valuable information hidden within the massive useless information, and people find it hard to identify the valuable information for the lack of effective methods or tools, so that it becomes even more difficult to access the useful information. Data mining is to help people discover useful knowledge from the massive datasets, get valuable information, and make better decisions.Meanwhile, the massive Web data is becoming an important data source for the development of Internet. Static HTML pages, interactive information stored in the database, and the user log information, various data forms the Web. There exists a lot of valuable information in the data, so more and more researchers and companies are using data mining techniques for Web data, to discover the potential knowledge and business rules.We find that there exists huge categorical data in Web, and in some specific applications, mining the categorical data is what needed, for the purpose of de-cision supporting. Thus, this paper considers the existing requirements, presents key category queries, the main contributions are listed below:1. This paper discusses the popular queries in decision support systems, such as Top-k queries, KNN queries, Skyline queries and their related work. We also present the importance of ranking queries applied in the decision support systems.2. This paper discusses the categorical data existing in Web and the mining value of it. We also consider the data mining requirements in decision sup-port systems, and present a new problem, key category queries. We analyze the new problem seriously and present two definitions of the queries, as well as the basic processing methods of them.3. This paper analyzes the complexity of the basic processing methods, and present a series of pruning rules for the key category queries of the second definition. In addition, we design the improved algorithm, and verified the effectiveness and efficiency of the improved algorithm by experiments.4. This paper discusses the performance problems that key category queries may meet when it comes to the massive data environment, and apply Map-Reduce algorithm framework to improve the original processing method. We also verified the effectiveness and efficiency of the distributed algorithm by experiments.

Keywords/Search Tags:

Data Mining, Massive Datasets, Categorical Data, Possible World Model

PDF Full Text Request

Related items

1	The Research Of High Efficient Data Mining Algorithms For Massive Data Sets
2	The Study Of Clustering Data With Categorical Attributes In Data Mining
3	Research On Categorical Data Clustering Algorithms
4	Research On Cluster Boundary Detecting Technology For Categorical Data
5	The Research On Clustering Algorithm For Categorical Data Using Quantum Mechanics
6	Research On Cluster Validity Indices For Categorical Data Clustering
7	Research And Design Of Data Mining Platform For Massive Medicaltreatment Data
8	Perturbation based privacy preserving data mining techniques for real-world data
9	Design And Application Of Customer Value Model Based On Massive Data
10	Research On Internal Validation And Algorithm For Categorical Data Clustering