Font Size: a A A

Machine learning for text mining: Classification, retrieval and recommendation

Posted on:2010-02-15Degree:Ph.DType:Thesis
University:The Pennsylvania State UniversityCandidate:Song, YangFull Text:PDF
GTID:2448390002479331Subject:Applied Mathematics
Abstract/Summary:
We all witnessed the information explosion of the World Wide Web which has brought us with continuously rapid growth of information and data. However, as the amount of data grows day and night, the need for efficient and effectivemanagement of information has also increased dramatically. As a result, using intelligent computerized algorithms to discover new and useful information from existing data has become a hot-pursuit in recent research of computer and information science.;This thesis addresses the issues of discovering useful information from textual content of the data, as well as efficient management and organization of the data. These research issues are usually referred to as the task of text mining, which is a branch of the broad area of information retrieval research that contains many interesting and challenging problems and applications. In this thesis, we mainly focus on four issues of text mining: text classification (Chapter 2 & 3), text retrieval (Chapter 4), text recommendation (Chapter 5) and topic discovery (Chapter 6). Specifically, Chapter 2 proposes dimension reduction and collaborative filtering techniques to improve the scalability of text classification; Chapter 3 further addresses the performance issue of text classification by introducing a new nearest neighbor classification method; Chapter 4 deals with retrieving correct name entities from the web and textual documents where the names are ambiguous; Chapter 5 deals with text recommendation for scientific documents and webpages; Chapter 6 aims at discovering dynamic topic trends and correlations in scientific documents; Chapter 7 concludes this thesis. We will also try to answer some difficult research questions based on our study.
Keywords/Search Tags:Text, Chapter, Classification, Information, Retrieval
Related items