Font Size: a A A

The Research On Distinguish Measure Of Repetitive C Language Test Questions In Database Based-on The Tree Structure Of Domain Ontology

Posted on:2016-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:X D YanFull Text:PDF
GTID:2308330461977075Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The online-test system of C program language is based on examination database. Because the original system lacks of duplication checking module, it is hard to avoid similar questions in examination database.Consequently, the quality of test paper and effect of examination would decrease. So, how to quickly and accurately find these similar test questions is what this paper would like to do.The duplicate checking of C test questions belongs to the similarity calculation in NLP. After study large amount of researches on the similarity calculation, this thesis would like to solve this problem in three procedures, they are word segmentation, word similarity calculation and sentence similarity calculation.In the aspect of segmentation, this thesis chooses ICTCLAS tool which is highly practical and reliable. It’s easy to extend original dictionary and part of speech. In word similarity calculation procedure, firstly, this thesis studies some knowledge system, such as "Chinese Thesaurus", "How Net" and "domain ontology". Then, domain ontology of C program language is constructed. Finally, "domain ontology" and "how net" are used to count the similarity of conceptions. In domestic, for sentence similarity calculation, there are many relative methods based on word sense, word order and syntax features. As the C similar test questions have less word changed and have fixed word sequence, this thesis selected "Levenshtein Distance" algorithm to calculate sentence similarity.In general, firstly, ICTCLAS is selected to split words and mark on part of speech. Secondly, C domain ontology is used to calculate domain word similarity. Lastly, "Levenshtein Distance" algorithm is used to count sentence’s similarity, in which the operation costs are different with each other because of the different parts of speech. Experiments show that these methods are very effective and accurate in identifying similar C test questions, so, the problem is solved basically.
Keywords/Search Tags:Domain Ontology, Levenshtein Distance, Examination Database, Duplicatiout Cheeking
PDF Full Text Request
Related items