Font Size: a A A

Research On And Realization Of Classified Document Identification System Based On Improved Support Vector Machine Algorithm

Posted on:2019-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:S QuFull Text:PDF
GTID:2428330575975434Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,as rapid development of computer science and network technologies,new technologies such as Big Data and Cloud Computing are making greater influences to every aspect of peoples' lives and productions,which are,undoubtedly,challenging national secrecy administrations in an unprecedented level.A technical inspection is one of the requisite measures that national secrecy administrations use to discover potential threats promptly and to investigate and stop any irregularity or illegality.One of the most principal methods is to inspect whether there is any classified document stored or processed on an unclassified computer.However,neither present technical measures nor software applications are able to satisfy the demands of efficiency,accuracy and objectivity of a technical inspection,or to meet the need dealing with unconventional cases,e.g.circumventing,mis-stamping,inadequate classifying,etc.Obviously,it's urgent that a new technology or methodology should be researched to solve the problems presented above.With the development of the Internet and electronic technologies,Support Vector Machine(SVM),known as a new text classification technique in the field of machine learning strategies,has shown a broad prospect.As one of the statistical learning methods,SVM is considered as the most convenient,efficient,and widely-applied algorithm to deal with cases such as text classification,image recognition and other classification issues.Because it has solid theoretical foundations and always present satisfying performances,its wide application and leading role in this study field has been promoted to greater levels.Under these premises,improved methods for SVM are presented in this thesis,by research and analysis on techniques and principles of Chinese character segmentation,text classification,etc.A classified document identification system(CDIS)is researched and realized based on improved SVM algorithm,in order to develop a new way to carry on technical inspections on national secrecy,mainly to solve problems on how to apply computer technologies to identify classified documents.Related theories and study status quo are summarized,SVM is decided to be used to realize the system researched in this thesis,principles of SVM are analyzed,improvements on text feature extraction by Third-Order Hidden Markov Model(HMM)are proposed based on comparisons of several major Chinese character segmentation methods,improvements on classical TF-IDF formula in aspects of distribution and positions of featured words in text vectorization areproposed based on study and analysis on classical TF-IDF formula and a comparative test is completed to verify the improvements,requirement analysis and function designs of the CDIS based on improved SVM algorithm are made,each designed module function is realized coding by PYTHON,and related trials and tests about the system are carried out at last.Based on the researching jobs done above,the objective of identifying classified documents in forms of electronic files has been realized in this thesis.Compared with traditional identifying methods based on keyword searching,the system in this thesis focuses on none other than the text content itself of a documents being inspected,and a conclusion on whether it is classified is made,as a result of which,a greater efficiency is able to be achieved.Besides,trained SVM does not store the contents of classified documents,so identifying classified documents in a technical inspection by a computer system like this can reduce involvements of human during the process of inspection,which avoids unnecessary extension of classified information knowledge,thus the security of classified documents is enhanced.In addition,practical problems such as circumventing,mis-stamping or inadequate classifying can be solved.In purpose of improving efficiency,accuracy,speed and convenience of users,a complete system is realized for classified document identification tasks,and is made more suitable for carrying on practical national secrecy administration tasks under new circumstances.
Keywords/Search Tags:Support Vector Machine (SVM), Classified Documents, Text Classification, Artificial Intelligence
PDF Full Text Request
Related items