In recent years,with the popularization and application of artificial intelligence technology in various fields,the field of automatic scoring of English composition has also received great attention and development.However,there has been no great breakthrough in the representation of textual content.The representation of traditional textual content is mostly based on latent semantic analysis technology,while latent semantic analysis technology can only extract thematic information,and the information of words will be ignored.Therefore,this paper proposes a text content representation method based on word vector clustering and a text content representation method based on the vector space model.It not only can fully characterize the meaning of the word text,but also takes into account the degree of compliance of the composition of the article,and on this basis,this paper develops an automated essay scoring algorithm based on word vector and multi model fusion.In order to better characterize text content,this paper proposes a text content representation method based on word vector clustering.First,the word2vec model is trained using the Wikipedia English corpus.Then the trained model is used to generate the word vector of the text to be tested and aggregated.And the statistical information of the corresponding word under each category is used as the content text feature.In addition,the text content representation method based on the vector space model is used to judge the degree of conformity of the students' writing essay.The keywords of the text are extracted by the vector space model,and on the basis of this,the theme-related feature is generated.In addition,this article uses lexical features and syntactic features as non-textual features to evaluate the quality of articles from the perspective of words and sentences.Then,using the previously extracted text features and non-text features,linearly fuse the prediction results of the three machine learning models(Random Forest,GBDT,XGBoost)as the final prediction results.Finally,this paper validates the effectiveness of the model by using the Automated Essay Scoring data set on Kaggle,an international data mining competition platform.After verification,the quadratic weighted Kappa value of the prediction results of test set data over the automated essay scoring algorithm based on word vector and multi model fusion proposed in this paper is better than that of the first place in the international Automated Essay Scoring competition on Kaggle,which verifies the effectiveness of the algorithm. |