Font Size: a A A

Study On Several Typical Data Mining Methods And Their Applications

Posted on:2011-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:C L DongFull Text:PDF
GTID:2178360305951060Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology, especially the widely used Internet, various large-scale data floods our daily life. While huge dataset delivering kinds of information via the format of text, web page, images, etc., it brings us the disaster of "explosion of data, lack of knowledge". It is much difficult to extract information or knowledge satisfied by users from such huge dataset. With the aim of extracting useful information from large-scale structured and semi-structured datasets, data mining has attracted much attention in the last two decades and has been used in many fields such as business decision, market analysis, industry control and medical diagnosis.It produces lots of clinical data during the process of modern clinical medicine. Analyzing and evaluating such data can find some potential useful patterns, which would help people with enhancing the knowledge of diseases and accordingly the research and management to their propagation. As one of the most effective approaches to extracting useful information from medical database and providing scientific decision-making for diagnosis and treatment of diseases, medical data mining has become an increasingly hot topic in the last few years. Different from the traditional data mining applications, medical data mining faces many challenges, for instance, the high dimensions of large datasets, the heterogeneous data with privacy issue as well as strict evaluation criterion. By elaborating various challenges existing in KDD Cup 2008 competition, this paper presents many challenges occurring in medical data mining. Via describing how we construct the final classification modle--Modified Boosted Tree which is ranked the fourth among all the solutions to Task 1, we present our analysis and solutions to these problems. This case can be regared as an epitome of the medical data mining. The issues and corresponding solutions it covers can provide some guidance to the medical data mining applications to some extent. The popularity of World Wide Web provides people with a more convenient platform for communication and sharing, it also boosts the flourish of kinds of web-based communities. With the aim of identifying and characterizing different relationships among community members, community mining becomes one of the hot topics in data mining. In our work, we choose the DBLP (Digital Bibliography & Library Project) data set as a test bed for our experiments. For the purpose of the current work, using the techniques from bibiliometrics and text mining methods, we construct local communities around either main topics or prominent authors with regard to a specific conference. To further analyze the evolution of local communities, we trace the yearly similarities of some of the related conference pairs. Besides, we divide the computer science area into 14 research communities by prominent research directions and characterize them from the point of view of the publication growth rate, collaboration trends, and population stability. Such information may be interested by many audiences to make decisions, such as the advanced students who are about to choose their specializations, the authorities who in charge of the foundation and investment.
Keywords/Search Tags:data mining, classification, medical data mining, breast cancer detection, community mining
PDF Full Text Request
Related items