Font Size: a A A

Research On Correcting English Article Error

Posted on:2016-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:X Q JinFull Text:PDF
GTID:2308330479990084Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the flourishing development of Natural Language Processing, Machine Learning and Big Data, English Grammatical Error Correction(GEC) has attracted more and more attentation of researchers. The system of English Grammatical Error Correction could both benefit millions of English writers and partly assist in solving nature language processing tasks, such as Machine Translation, Natural Language Generation, and so on. The paper mainly focuses on english article error correction. Nowadays, It is a common method to treate the article error correction as a problem of multi-class classification with three labels: a/an, the and null. Classifier-based methods for article error correction have some drawbacks: firstly, the features employed in article error correction are extraced empirically, which may bring about noises in feature set and result in some redundant features; secondly, features are usually represented by one-hot encoding, which suffers from data sparsity and high dimension. This paper not only deals with the drawbacks in classifier-based article error correction, but also explores how to employ convoluation neural network for artice error correction. The content of this paper could be divided into the following three parts:1. This paper employs logistic regression model to deal with the article error correction. After recognising all the possible positions in which an article error may occur, Five categories of features are collected, empirically, for correcting article error. In succession, a feature selection method named Sequential Forward Selection(SFS) is used to do feature selection on the unusual features, which may reduce the nosie and redundant features. After feature selection, different features’ effects are compared, especially on source word.2. After analysising the drawback of One-Hot encoding, word embeddings with different categories are employed for a better feature representation,which employs two kinds of method. First, word embeddings are directly used as features of word to correct the error of article. Secondly, a cluster based method is used to reduce the dimension of features. For the word feature, both K-means cluster and Brown Cluster are used to do this and for pos feature, a rule is employed.3. A deep learning method of convolutional neural network is employed for article error correction. In order to learn the feature more effectivly, convolution neural network is employed to do article error correction. The model takes word tokens surrounding the context of article as feaures. By employing convolution and pooling, features are learnt from the word embeddings of words. In order to solve low accuracy in the model, an effective post processing based on language model is proposed and this paper also shed some light on why convolution neural network achieves a low precision.
Keywords/Search Tags:article error correction, convolutional neural network, word embeddings, selecting
PDF Full Text Request
Related items