Font Size: a A A

The Research Of Text Classification Based On Distance Metric Learning

Posted on:2014-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:K PengFull Text:PDF
GTID:2248330392960860Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
As an important branch of modern information technology textclassification techniques has made great progress in past two decades,however, with the growth in the exponential of the number of Web pages onthe Internet, the diversity of the Internet information is also showing moreand more complextrend. How to change the traditional text classificationalgorithm to adapt them to diverse modern Web information categories, lowdiscrimination characteristics become the most problems to be solved.Distance metric learning algorithm is a class of machine learning algorithmsthat research around the the sample similarity metric mode.In terms of textclassification algorithm based on statistical and machine learning are alreadyquite mature, to have a greater improve of classification accuracy becomesmuch more difficult. As a result, changing the sample distance metric modeto reach better classification results is a current research focus. This researchhas identified in the field of image and classification a successful application.This article research is mainly expanded from distance metric learning for text classification. First, based on the extensive research literature wesummarized the existing work of the field,and introduced several commondistance metric learning algorithms, Secondly, we introduced the textclassification process, and analysed some key algorithms, At last, wecombined some distance metric learning algorithms with existing textclassification algorithms and proposed a series of improvements based on theproblems in practical applications.The main work of the article:(1) Applied distance metric learning based on taking into account theimpact of sample density, introduced improved scheme. The new schemedesigned a density function combining with the K-nearest neighbor classifierto balance the bad impact of distance metric learning algorithm.(2) Inspired by the large margine nearest neighbor (LMNN) algorithm,we proposed a new learning algorithm based on cosine distance metric calledCS-LMNN, the algorithm is more suitable for the classic vector space model.(3) Finally, basedon the theory mentioned before,we realized the entiretext classification system, including classification module, pre-processingmodule, feature selection module, the distance metric learning modules, aswell as evaluation module.
Keywords/Search Tags:Text Classification Distance Metric Learning, Density, Cosine, Vector Space Model, Large Margine Nearest Neighbor
PDF Full Text Request
Related items