Font Size: a A A

Research And Implementation Of Job Duplication Checking System Based On C-LSTM

Posted on:2024-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:C Y ZhangFull Text:PDF
GTID:2568307085492934Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of the fifth generation of mobile communication(5G)in China,access to information is easier than ever,but at the same time,there are phenomena such as plagiarism of dissertations and similarity of assignments.Universities have paid attention to these phenomena and taken measures to avoid them,and one of the most used methods is thesis checking.However,the existing checking systems are mostly based on the comparison checking of literature databases,and do not have the function of checking between thesis-type assignments submitted by students.The proposed system aims to improve the internal checking of thesis-based assignments that also have the need for checking,and extend it to be used in daily teaching assignments to eliminate academic misconduct from the source.The text similarity calculation method based on deep learning network is the current research hotspot in the application field of checking system,which encodes two texts to be compared and extracts the semantic vector by neural network,and finally calculates the similarity between two texts by similarity calculation method.However,the existing deep learning network models suffer from long text semantic missing and short text dependency,etc.To address such problems,this thesis proposes a model consisting of LSTM networks(Long Short Term Memory networks),CNN networks(Convolutional Neural Networks)and fused with LDA model(Latent Dirichlet Allocation).The C-LSTM model(Convolutional-Long Short-Term Memory Networks)consisting of Latent Dirichlet Allocation(LDA)is proposed to calculate the similarity between jobs and improve the problem of semantic missing of long text and short text dependency,which belongs to the field of text similarity matching technology.In this model,firstly,BERT is used to separate words and vector transformations for multiple text segments of the input model,secondly,bidirectional LSTM network is used to extract contextual association information,CNN network is used to incorporate word embedding information into contextual association information,global maximum pooling(Maxpooling)is used to retain key information,and Gibbs sampling is used for topic extraction to extract each segment The sentiment of the text is fused by a fully connected neural network to obtain a high-dimensional multifeatured semantic vector,and finally,a weighted text semantic similarity calculation method is used to obtain the similarity between two segments of text.After comparison experiments,the C-LSTM model has better overall performance in terms of recognition accuracy and time consumption in real data sets.The system is written in Java language,using SSM framework and My SQL as the system development database,which mainly serves three types of users: teachers,students and administrators.Teacher users can check the essay-based assignments submitted to the system by students,get the similarity between each assignment and other assignments,and also manage their own student assignment database and review student assignments;student users can upload their assignments to the system according to the teacher’s prompt and get the teacher’s review results of this assignment;administrator users can configure and manage the whole system.
Keywords/Search Tags:Assignment checking, C-LSTM model, LSTM model, CNN model, LDA topic model
PDF Full Text Request
Related items