Research Of Text Quality Analysis Algorithm Based On Deep Learning

Posted on:2020-07-02

Degree:Master

Type:Thesis

Country:China

Candidate:H W Huang

Full Text:PDF

GTID:2428330596476774

Subject:Engineering

Abstract/Summary:

Nowadays,with the rapid development of computer networks,as Internet users,we are drowning in vast amounts of information.Text information is one of the most widespread information on the Internet.The quality of the information in text data haves seriously influence on the speed at which users obtain information and the choice they make.It is unrealistic to rely on manpower to evaluate the textual data of the Internet.Few organizations can maintain such huge human resource cost.Therefore,it is very meaningful to use the computer algorithm to analysis the quality of the text automatically.At the same time,deep learning has developed rapidly,and deep learning solutions for a large number of natural language processing tasks have achieved good results.Based on the above considerations,this thesis study the text quality analysis task and chooses to use the deep learning method to solve the task.This thesis designed two solutions to solve this task from two different point of view.First,text quality analysis is considered as a classification text problem on quality attributes.This thesis proposes the classification of text quality attributes using classoriented improved word vectors and capsule memory networks.The model of the improved word vector for category can effectively involve the category information of the text corpus.The word embedding trained by this method can not only contain shallow semantic information,but also involve the text category information,which is useful for the final classification.According to the relevant characteristics of the text quality task,such as long text length,blurred features and so on.The capsule memory network designed in this thesis to perform text classification.The network redesigned and built the external memory module,input module,feature extraction module,feature preservation module and output module based on the memory network.The involvement of the external memory module and the processing of the input module allow the network to process longer textual information,and the capsule network design of the feature extraction module can use the vector to extract features more carefully.The multi-round computing process of the network can further enhance the feature extraction capabilities of the model.Then,this thesis proposes to analyze discourse relation analysis in the text,and judge the quality of the text from the logic of the context.In this task,this thesis focuses on the more difficult task,which is task of implicit discourse relation analysis,and designs the fusion word embedding and multi-task learning based bi-directional long short-term memory network to solve the task.The fusion word vector effectively involves the other prior knowledge based on statistics,which enhances the amount of information contained in word embedding.Then,based on the relationship between implicit discourse relation recognition and explicit discourse relation recognition,a bi-directional long short-term memory network with multi-task learning is designed.The multi-task learning mechanism introduces the corpus of explicit sentence-sentence relationship to enhance the model.Feature extraction capabilities effectively improve the overall performance of the model.Finally,experiments on a couple of corpus is designed in this thesis,then this thesis the compared the above method with other methods.The experimental results show that the method we proposed has certain advantages in performance compared with other methods.

Keywords/Search Tags:

text quality analysis, improved word embedding, capsule memory network, fusion word embedding, multi-task learning

Related items

1	Research On The Representation Of Word Embedding Based On Knowledge Fusion
2	Research On Word Spotting Technology In Handwritten Historical Document Images
3	Design And Realization Of Text Abstract System Based On Word Embedding
4	Dynamic Weighting Of Word Embedding And Distributed Learning Strategies
5	Research On Text Sentiment Analysis Based On Bert And Multi-granularity Convolutional Capsule Network
6	Research And Improvement On Text Classfication Based On Word Embedding
7	Research On Long Text Classification Based On Word Embedding Technology
8	Research On Text Classification Algorithm Based On Word Embedding Model
9	Research On Network Representation Learning Method Based On Word Embedding
10	Combining Topic Model And Word Embedding For Short-Text Classification