Font Size: a A A

Key Category Mining Algorithm On Massive Datasets

Posted on:2011-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:X F XuFull Text:PDF
GTID:2178360305997802Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the appearance of Internet, we have entered a new information age. The approaches to access information are not as few as before, and the content is not as limited as before. People can access various information via network, while more and more new information is generated from time to time. The rapidly increasing information brings convenience to people, however, it brings a series of problems to people at the same time. The huge amounts of information make some valuable information hidden within the massive useless information, and people find it hard to identify the valuable information for the lack of effective methods or tools, so that it becomes even more difficult to access the useful information. Data mining is to help people discover useful knowledge from the massive datasets, get valuable information, and make better decisions.Meanwhile, the massive Web data is becoming an important data source for the development of Internet. Static HTML pages, interactive information stored in the database, and the user log information, various data forms the Web. There exists a lot of valuable information in the data, so more and more researchers and companies are using data mining techniques for Web data, to discover the potential knowledge and business rules.We find that there exists huge categorical data in Web, and in some specific applications, mining the categorical data is what needed, for the purpose of de-cision supporting. Thus, this paper considers the existing requirements, presents key category queries, the main contributions are listed below:1. This paper discusses the popular queries in decision support systems, such as Top-k queries, KNN queries, Skyline queries and their related work. We also present the importance of ranking queries applied in the decision support systems.2. This paper discusses the categorical data existing in Web and the mining value of it. We also consider the data mining requirements in decision sup-port systems, and present a new problem, key category queries. We analyze the new problem seriously and present two definitions of the queries, as well as the basic processing methods of them.3. This paper analyzes the complexity of the basic processing methods, and present a series of pruning rules for the key category queries of the second definition. In addition, we design the improved algorithm, and verified the effectiveness and efficiency of the improved algorithm by experiments.4. This paper discusses the performance problems that key category queries may meet when it comes to the massive data environment, and apply Map-Reduce algorithm framework to improve the original processing method. We also verified the effectiveness and efficiency of the distributed algorithm by experiments.
Keywords/Search Tags:Data Mining, Massive Datasets, Categorical Data, Possible World Model
PDF Full Text Request
Related items