Font Size: a A A

The Key Technology Research On Automated Essat Scoring

Posted on:2016-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:J W ChenFull Text:PDF
GTID:2308330479990034Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Writing is an significant approach to evaluate the candidate ’s language ability and organizational skills of vocabulary in large-scale language test, however the manual marking method exist following drawbacks: one is it will comsumes enormous resources and manpowers, another is different rater has different scoring criteria,which may resulting in subjectivity and inconsistency. With the development of Natural Language Processing(NLP) technology, it makes breakthrough in terms of part-of-speech tagging(POS tagging) and parsing. Automated Essay Scoring algorithm based on statistical and NLP technology was pr oposed frequently. The traditional AES algorithm extract feature the from the level of lexical, thesis and organization to constrcut the feature set, and then rely on training strategy like Linear Regression model to training, the feature extract can consider as non-textual and textual feature whether it consider the meaning of words. This approach bring some weakness:combination all kinds of features simply may can’t obtain the best performance, the training stragety used mainly can’t capture the non-linear relationship between features, and the feature used rarely consider the semantic information of essays. In this paper we presente our research on model selection and extraction of semantic features, it mainly contain the following three apsects:Firstly, we construct the non-textual feature based on lexical and syntactic information, compare and analyses the performance between Random Forest regression model and other models that used frequently in AES. Then check the performance of all kinds of features and gain the best non-textual feature set by combination of features. Address the weakness of non-texual features hadn’t consider the content of essay, the marking mechanism is easy to be deceived and use, we adopt topic model LDA to extract textual information and the experiments show that it achieve good performance.Secondly, in order to measure the word diversity of essays, we should minging semantic information of essays and words as precisely as possible, we choose word embedding to represents word. clustering the word embedding into clusters, and extract semantic feature of every cluster to represent the semantic feature of essays. To avoid the indiscriminative of polysemy for word embedding, we adopt topical word embbeding method to represents word, which merge the topical model LDA and word embedding. In experiments we compare the clustering algorithm of word embbeding to calssical Brown word clustering algorithm, the result shows that the topical word embedding is more precisely in word representation than ordinary word embedding, and achieve best performance on texutal features.Thirdly, we implement a AES system. The core function of the system is essay scoring, which is accomplished by the training strategy and clustering method of topical word embedding, adding the corresponding auxiliary to help user realize and enhancing writing skills. we decompose the scoring process into word quality, syntax quality and content correalations so that the user can have a clear view of theis essays, the recommendation of excellent essays and essay retrival will provide help enhacing their writing skills.
Keywords/Search Tags:Automated Essay Scoring, Random Forest, Topic Model, Word Embedding, Clustering
PDF Full Text Request
Related items