| With the acceleration of the digital transformation of education,teaching resources such as questions,test papers,and courseware have been widely digitized,and intelligent education platforms have been continuously launched.These platforms provide teachers and students with efficient and convenient applications,such as automatic paper composition,question recommendation,and adaptive testing,to improve teaching quality and learning efficiency.These applications are inseparable from the difficulty label of the questions stored in the question bank.Data-driven difficulty prediction models the text of questions and provides estimates of test question attributes with natural language processing.Because of its high automation and objectivity,it has become the focus of research in intelligent education in recent years.In recent years,question difficulty prediction benefits from the development of natural language processing technology.Text complexity indicators such as word composition and syntactic dependence distance or deep learning models such as convolution and recurrent neural networks are used to automatically predict difficulty for new questions.Model training requires a large amount of data as support,including the question text and the difficulty value obtained from the test logs.However,the distribution of these data in different subjects and groups(such as schools)is highly unbalanced,and the lack of test logs in some subjects and groups makes it hard to train models.To tackle the shortcomings of existing methods in cross-domain scenarios,the research on cross-subject difficulty prediction based on domain adaptation and group-aware relative difficulty prediction based on domain generalization are studied in this dissertation.Because of the different text corpus and the large semantic gap between subjects,it is hard for the difficulty prediction model to be reused in the multi-subject scenario.The cross-subject difficulty prediction trains the model from resource-rich subjects and is reused in other subjects.The readability and semantic features of the questions are integrated.The adversarial network based on the stimulus and task difficulty is proposed by extracting the difficulty representation of the question and domain alignment based on data sampling,and the adversarial network is performed to alleviate the problem of domain data mismatch in the initial stage.The experiments on multi-subject datasets are conducted to evaluate the effectiveness of the proposed method.After performance comparison and enhancement effect experiments,it is verified that the proposed method can significantly improve the accuracy in the cross-subject scenario.A relative difficulty prediction framework is proposed based on domain generalization,which provides different difficulty estimations for each student group.However,efficiently modeling student groups is non-trivial because interactions between groups and questions are often noisy,and group profiles(e.g.,socioeconomic status)are usually impossible to obtain.Here the student group is represented by the marginal distribution of answering questions to generate personalized predictions.With marginal transfer learning,the model can be adapted to new groups without additional training.At the same time,the influence function is used to identify harmful training samples from the answer data of each group,and the regularization strategy is adopted to optimize the training data set.Experiments on datasets divided by region and school show that the proposed method can perform well in the student groups with noisy data and questions that do not appear in the training set.In addition,a perspective based on average difficulty and polarization is provided to measure the quality of questions. |