Font Size: a A A

Research On Induction Matrix Completion Algorithm Integrated With Text Information And Its Application

Posted on:2019-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:J J GengFull Text:PDF
GTID:2428330566995850Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the development of big data technology,we have entered the information society and the information increases exponentially,especially in the electronic text information.Text is not only an important medium for acquiring knowledge,but also the basis of scientific research.The text to a certain extent reflects the level of knowledge in the historical stage of human society.However,there is a large amount of implicit or indirect information in the text,which is often overlooked in the research.How to mine and recover lost literature information in mass data has become very important.The emerging matrix completion technique has provided a solution to the above problems.When the data is large-scale and the association matrix is sparse,the traditional matrix completion methods can not achieve excellent performance.Moreover,the traditional methods only utilize the information of the sample space to predict the labels of samples.In order to overcome the above drawbacks,the inductive matrix complement method is proposed to leverage text information in both sample space and label space.Based on the inductive matrix completion technique,an inductive matrix completion method is proposed,where text feature information is extracted by text mining technique and the eigenvectors from both sample and label space are incorporated.In addition,a new feature representation method is employed to improve the performance in the case of the massive text information.The contributions of this article are as follows:(1)A text feature representation method: First,text features is extracted by using Word2 vec tool.Then,a sample is represented by a real-valued feature vector,which is obtained by transforming multiple instances into a single instance.The proposed method can improve the information recovery performance when dealing with ambiguous,unstructured and incomplete text message.Furthermore,this method can reduce the dimension of the feature vector of text information significantly to overcome the curse of dimensionality.(2)A inductive matrix completion algorithm based on text mining,named WordIMC,is proposed to combine the text representation with the inductive matrix completion technique.WordIMC improves the traditional completion matrix algorithm through utilizing the feature information from both sample and label space.Finally,the experiments on multiple large scale biological data sets demonstrate the effectiveness and advantage of WordIMC compared to other state-of-the-art matrix completion methods.
Keywords/Search Tags:Induction Matrix completion, Text Mining, Text Feature Representation, Text Information
PDF Full Text Request
Related items