Font Size: a A A

Text Clustering Method And Application Research Based On NMF Algorithm

Posted on:2016-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y K ZhangFull Text:PDF
GTID:2308330464461747Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of Internet with each passing day, there are thousands of new information, including images, video, audio, produced on the Internet every day, among which texts are the fastest one to be produced. But only when we can get more useful information, we will truly benefit from the information ocean. Due to the update of Internet information is not only mass, but also contemporary. Usually a kind of popular word can be well known overnight and a trendsetter will use these popular words to active atmosphere. But the traditional machine learning is completely failed to keep pace with the speed of the changing of these new words which make the classification effect hard to be satisfied. As a result, a new learning mechanism which named transfer learning is needed. The main idea of transfer learning is to extend the result of classification in some domain known to another domain related to it, which can realize the classification in the related domain. Transfer learning is an extension of traditional machine learning which is based on binary nonnegative matrix factorization. And in order to expand to transfer learning, ternary nonnegative matrix factorization is developed to coordinate to the process of transferring. One of the representatives of these algorithms is CCI algorithm. Based on the CCI algorithm, this paper proposes the following research:1) This paper first analyzes the rationality of the CCI algorithm used in the constraint condition. By giving the physical meaning in various constraints of CCI, it puts forward a set of interrelated constraints and improved method of iterative manner- protected CCI(PCCI) learning algorithm, and gives a comprehensive experimental results and data contrast. Through getting sharply higher classification accuracy on the experimental data set, it shows the effectiveness of the PCCI.2) In the further researching of the hidden-space, this paper studies the hidden space in NMF as well as the semantic role in the classification. And it abstract the core of transfer learning to get a fast migration learning algorithm——three-dimensional-hidden-space, which include the two forms——Boolean-weights and word-weights. In experiments, it successfully obtains the effect within 1 second or so, which CCI algorithm would cost more than 6000 seconds to complete. What’s more, the word-weights form performs better than CCI in mean effect.3) This paper extends a comprehensive analysis on the problem of misusing constraint conditions, which was found in practice. At first it tables lists normalization, match a few effective solutions. And then it use the 2 artificial datasets to clearly distinguish the role of each variable in the entire migration learning process. Among the schemes it selected the most suitable learning scheme, migration to determine "word co-occurrence matrix W should be through the word frequency matrix product, rather than simply counting to establish conclusion which is different from the methods of CCI.
Keywords/Search Tags:Nonnegative matrix factorization, transfer learning, word frequency matrix, word co-occurrence matrix, protection-type iteration
PDF Full Text Request
Related items