Font Size: a A A

Research And Implementation Of Text Representation In Continuous Sapce

Posted on:2017-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZengFull Text:PDF
GTID:2348330518494003Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Text representation plays the foundational important role in natural language processing applications such as text classification,information retrieval.Especially in this time which the scale of web text is growing exponentially,the obtaining of effective low-dimension text representation becomes the key to practicability of many applications.So text representation technology has been widely researched.In recent years,with the development of neural networks,the distributed word representation appear,it has the advantages of low dimensions and high degree of compression,also gives the rising to the discuss of popularizing distributed representation to larger language unit such as sentence,text and so on.But the current research is still in its infancy.So Ss the research of this issue especially discussion in china is very few,this thesis follows jobs,on the basis of predecessors' work.First,this thesis realizes two complementary distributed text representation based on neural network.One of them is a learning method based on the context to predict the current term.The other one is based on current word to predict context.And this thesis analyzes the respective advantages of them.Applying the text representation to the task of emotional analysis and similar text retrieval application,experiment gets good performance in emotional analysis,and the result in the task of similar text retrieval need to be improved.In addition,this thesis proposes applying text representation in the task of text classification.And this thesis analyzes it in several dataset of text classification to verifying its validity.What's more,this thesis analyzes the affection to classification performance of several factors such as vector dimension,context window,learning rate,pre-training and so on.From the experiment,this thesis can get that distributed text representation can get better performance in low-dimensionality vector comparing to other traditional text representation.At last,this thesis realizes text classification demonstration system basing on the text classification algorithm and parameter we get.This thesis can summation the data of user,and improve the performance of classification as the increasing of data.
Keywords/Search Tags:text representation, distributed text representation, text classification, support vector machine
PDF Full Text Request
Related items