Font Size: a A A

Research And Application Of Distributed Textmining Based On Feature Learning

Posted on:2016-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:H YinFull Text:PDF
GTID:2298330467991752Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the popularity and development of network, more and more data mining tasks need to be completed in distributed environment. And countless text mining demands wait to be satisfied with blog, micro-blog and Wechat. That makes text data mining in distributed environment a hot research field. However, when dealing with text mining in distributed networks, the constraint of network should be considered, and it is hard to represent text as semantic feature to satisfy data mining algorithms. Thus, making data mining algorithms distributed and learning text features to suit relevant algorithms is the key point of distributed text mining.To cluster data in distributed environment, the Expection Maximization (EM) to learning the model of the mixture of probabilistic principle component analyzers (MPPCA) is first extended to a variant form. We achieve distributed subspace clustering by learning MPPCA with distributed EM.Then the semantic retrieval problem in structured peer to peer (P2P) networks is researched. We introduce to use standard alpha stable distribution to build a semantic similar hashing function. The function is used to replace consistent hashing in P2P networks and we construct a distributed network supporting semantic retrieval.To get better performance using the proposed algorithms above for text data, the feature learning of text data based on neural network is researched. For sentence feature learning, we propose to use unfolding recursive autoencoders and dynamic average pooling to represent sentences. Experiment on paraphrase verifies the feature learnt by proposed algorithms contains more semantic information.A distributed text retrieval system which can retrieve text data in distributed environment is also described in the end of the paper. The system supports sentence based retrieval, which proves the value of this paper.
Keywords/Search Tags:data mining, distributed data mining, text mining, feature learning, distributed clustering
PDF Full Text Request
Related items