Construction And Automatic Filtering Method Of Large Sclae Short Text Summary Data Set

Posted on:2016-05-15

Degree:Master

Type:Thesis

Country:China

Candidate:F Z Zhu

Full Text:PDF

GTID:2348330566453738

Subject:Computer Science and Technology

Abstract/Summary:

The task of short text summary involves computing the semantic similarity between two short text and the study of natural language generation technology,it is a class of problems that has great research value.The current deep learning has been used to study the field of natural language processing,But for the short text summarization problem,because of the lack of large-scale data set,deep learning model is not suitable for this problem.I participated in the construction of a large scale short text summarization data set,to some extent,which makes up for the status quo of the lack of data.However,due to build large-scale data sets is used in automatic data collection methods,the proportion of noisy data is higher,which may be result in the research on this data set will be disturbed.Since the data set there are a lot of abstract short summaries,noise filtering tasks is bound to involve short text semantic similarity matching problem,therefore,it is very important to study how to work on the noise filtering task which needs to dig more deep semantic information.In this paper we study the problem of short text semantic matching,which the difficulty is to model short text,in the case of the model requires sufficient information to retain the original short text,a semantic similarity matching model based on LSTM model is proposed.The LSTM model is suitable for modeling the sequence data,and it can be adapted to save the information in the sequence,so it is feasible to predict the semantic similarity based on LSTM model.According to the characteristics of short summary and short text,a new method to improve the standard LSTM unit is proposed.In the experiment,we randomly sampling from the research center of Harbin Institute of Technology of Shenzhen Graduate School of Intelligent Computing Research Center of the short text summarization data set,manual tagging to build a data set used for noise data filtering task.Aiming at the problem of semantic similarity matching of short text,the LSTM model and the traditional vector space model,latent semantic analysis model and convolution neural network model are compared.Although the effect of the LSTM model is lower than that of the latent semantic analysis model,the improved LSTM model has been greatly improved compared with the standard LSTM model,which is close to potential semantic analysis model.

Keywords/Search Tags:

short text semantic matching, short text summarization, LSTM model

Related items

1	Automatic Summarization Alorgithm For Chiness Short Text
2	A Short Texts Matching Methodusing Multi-level Features
3	Research On Short Text Automatic Summary Model For Web Reviews
4	Research And Application Of Short Text Semantic Similarity Model Based On Deep Learning
5	Bi-LSTM Short Text Emotion Analysis Combining Semantic And Self-attention Mechanism
6	Research And Application Of Topic-based Automatic Summarization Of Short Text
7	Research On The Method Of Semantic Similaritycalculation Of Short Texts Based On HowNet
8	Research On Short Text Similarity Based On Deep Learning
9	Research On Methods Of Improving Semantic Coherence Of Text Summarization
10	Research On Fitlteration And Classfication Methods Of Large-Scale Short Text