Font Size: a A A

Construction And Automatic Filtering Method Of Large Sclae Short Text Summary Data Set

Posted on:2016-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:F Z ZhuFull Text:PDF
GTID:2348330566453738Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The task of short text summary involves computing the semantic similarity between two short text and the study of natural language generation technology,it is a class of problems that has great research value.The current deep learning has been used to study the field of natural language processing,But for the short text summarization problem,because of the lack of large-scale data set,deep learning model is not suitable for this problem.I participated in the construction of a large scale short text summarization data set,to some extent,which makes up for the status quo of the lack of data.However,due to build large-scale data sets is used in automatic data collection methods,the proportion of noisy data is higher,which may be result in the research on this data set will be disturbed.Since the data set there are a lot of abstract short summaries,noise filtering tasks is bound to involve short text semantic similarity matching problem,therefore,it is very important to study how to work on the noise filtering task which needs to dig more deep semantic information.In this paper we study the problem of short text semantic matching,which the difficulty is to model short text,in the case of the model requires sufficient information to retain the original short text,a semantic similarity matching model based on LSTM model is proposed.The LSTM model is suitable for modeling the sequence data,and it can be adapted to save the information in the sequence,so it is feasible to predict the semantic similarity based on LSTM model.According to the characteristics of short summary and short text,a new method to improve the standard LSTM unit is proposed.In the experiment,we randomly sampling from the research center of Harbin Institute of Technology of Shenzhen Graduate School of Intelligent Computing Research Center of the short text summarization data set,manual tagging to build a data set used for noise data filtering task.Aiming at the problem of semantic similarity matching of short text,the LSTM model and the traditional vector space model,latent semantic analysis model and convolution neural network model are compared.Although the effect of the LSTM model is lower than that of the latent semantic analysis model,the improved LSTM model has been greatly improved compared with the standard LSTM model,which is close to potential semantic analysis model.
Keywords/Search Tags:short text semantic matching, short text summarization, LSTM model
PDF Full Text Request
Related items