Font Size: a A A

Chinese Text Categorization On Weapon Corpus

Posted on:2019-07-29Degree:MasterType:Thesis
Country:ChinaCandidate:L ChenFull Text:PDF
GTID:2428330572450660Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the extensive application of modern science and technology in military field as well as the lots of researches and heavy use of high-tech weaponry,both the scale and quantity related to military information resources are increased day by day.In order to dominate in the conditions of future wars,all the countries put the work to rapidly and accurately get military intelligence information in an extremely important place,but it is obvious that the traditional method to obtain military intelligence information only through artificial means no more adapt to the demands of modern wars,especially the rapid development of internet technology and the increasing expansion of various information resources,thus,how to timely,accurately and stably extract useful military intelligence from disorganized information has obviously become an object that needs to be researched and paid attention to.Therefore,with the military information construction and development in our country as the background,and the enhancement of information retrieval ability in military field as the starting point and objective,this paper researches the classification of Chinese text that faces the weaponry corpus in depth on the basis of independent structure of weaponry corpus with the purpose of providing assistance for the enhancement in military information retrieval ability of our country.The paper lays emphasis on the research about the following two aspects.The first is to realize the research and structure of weaponry corpus.It mainly expounds the significance to structure the weaponry corpus,at present the corpus has already been widely researched and used,but seeing from the existing mature Chinese corpuses,the corpus that aims at the information resources in military field has not been structured yet.Thus,based on the in-depth analysis and research about the characteristics of weaponry corpus,this paper gives preference to UVA,guided missile and airship,three representative categories in the research field of weaponry,to realize he structure of weaponry corpus through collecting and sorting a large number of literatures about weaponry as well as combing with the development status and construction emphasis of modern weaponry.At present,the weaponry corpus structured in this paper fills in the blank of corpus structure in current military field.In the future,the existing corpus will be constantly filled in and improved through manual sorting according to the development of weaponry,and bring into more categories of military resources,finally a weaponry corpus that totally faces the military field,has a considerable scale and suits for the handling of various electronic text information application will be structured,which is expected to satisfy the demand of our army on the military intelligence information retrieval.The second is to realize the classification of Chinese text that faces weaponry corpus.This paper investigates and considers the text classification technique at home and abroad in depth,lays particular emphasis on the research about the definition of text classification,classification process,Chinese words segmentation,text representation model feature selection approach and four common classification algorithms,and provides test procedures and results.In the text classification,the influence of using feature selection methods of different word frequencies on vector space modal representation is tested,and the three feature selection methods: document frequency,information gain and mutual information,are respectively used to test the effect of using the four classifiers: Linear SVM Classifier,Na?ve Bayesian Classifier,Decision Tree Classifier and K Nearest Neighbor Classifier.Through the experimental verification,and with the combination of strong exclusive characteristics of weaponry corpus,it can explain that the Chinese text classification effect of Linear SVM Classifier is best when adopting the structured weaponry corpus.At the same time,the paper explains the SVM Classifier in detail,and analyzes the situation of classification in which the wrong categories appear.Due to the characteristic of multidimensional space cooperative combat of modern wars,there exist cross expressions between literatures,which is the most direct reason for wrong classifications.
Keywords/Search Tags:weaponry, corpus, Chinese text classification, SVM algorithm, classifier
PDF Full Text Request
Related items