Font Size: a A A

Research On Neural Network Based Text Representation And Application

Posted on:2019-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:S GaoFull Text:PDF
GTID:2518306473453714Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the scale of web text increases explosively This provides an application demand for natural language processing and generating.Deep Neural network,on the basis of simulating humans' learning,can learn the representation of input data.Mapping original high-dimensional,sparse and discrete lexical representation into a distributed representation,the deep learning based algorithm can overcome some shortages from those traditional methods.On the basis of natural language processing and the corre-sponding algorithms,this thesis does some research works on neural network based textual representation and application.The research has important scientific and application values The main research works are as follows·This thesis presents a Bidirectional Recurrent Neural Network and Conditional Ran-dom Field based lexical analysis.The Bidirectional Recurrent Neural Network based method can learn the representation of text from the two directions.The Conditional Random Field based method is used to generate the final tagging sequences.The effectiveness of the pro-posed algorithm is also verified by several experiments on bakeoff2005 datasets,and outper-form the baselines of 1%?2%·This thesis presents a self-attention based method for textual vectorization represen-tation,focusing on different parts of the textual data.Furthermore,we can use different self-attention mechanisms for different tasks and semantic parts to generate the correspond-ing textual vectorization representation.This allows it to focus on different semantic parts in some specific tasks.Several experimental results among Yelp comments dataset,Amazon dataset,and our news dataset also show the feasible of the proposed approach·This thesis presents a modeling and sorting method on unordered words set.Through the mechanism on Memory Encoder,this thesis presents methods to model these unordered textual sequences.Based on self-attention based approach,the proposed model can obtain different aspect of the corresponding text.By using the Pointer Network and the copy mecha-nism,the proper natural language statement can be recovered from the unordered sequences,and as a result,the normal textual order can be restored from the unordered sequences.Sev-eral experimental results on English and Chinese corpus,according to different metrics,verify the feasibility of the proposed approachThe experimental results show that the neural network based approach is feasible in the lexical analysis,text representation,and text sequence ordering.The application of textual classification in big data search and mining platform also shows the effectiveness of the above methods.Finally,the thesis also presents the existing problems and the further research plans.
Keywords/Search Tags:Deep Learning, Neural Network, Chinese Word Segment, Text Representation, Words Set Ordering
PDF Full Text Request
Related items