Font Size: a A A

The Text Classification System Towards Maritime Area

Posted on:2012-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:X WangFull Text:PDF
GTID:2178330335955410Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Currently, many countries have begun to do the jobs like digital documents' organization and permanent preservation as well as studying and exploring the construction problems of the digital resources such as open access and share of digital resources, field professionalization of the knowledge base and digitization of the library literatures. In our country, the building and maintaining of the maritime information still also face some serious problems like low classification accuracy, poor timeliness and high cost. Therefore, automatic classification technologies must be addressed to solve these problems.The paper introduces the usual techniques in automatic text classification and does a deep research on the text classification techniques and related algorithms, including word segmentation, feather selection, training, performance evaluation and so on. According to the characteristics of maritime literatures, the demand and the overall design of the text classification system faced on maritime area are developed in the paper. The corpus used in the classification of the maritime professional literatures is built as well as the classification hierarchy. The preprocessing of the text in the corpus is also completed. Furthermore, five classification methods are used in the system and automatic text classification of mass real text faced on maritime area literatures is achieved. Experiments and analysis of the results are also given in the paper. The implemented algorithms used for classification include K Nearest Neighbors (KNN), Naive Bayes (NB), Support Vector Machine (SVM), Decision Tree (DT) and Clustering Center (CC). Through a large number of experimental data, the differences of the classifiers in different datasets are analyzed. The establishment of the text classification system faced on maritime area can greatly promote the construction process of our country's maritime information resources and drive development of information recourses in the related field. The work has important social significance and scientific research value.Through experiments evaluation and comparison of the above algorithms are done and some experience values of related parameters are obtained. The experimental results can be used in information retrieval, information filtering document classification in library in maritime area.
Keywords/Search Tags:Text Classification, Maritime Area Literatures, Corpus, Feather Selection, Classification System
PDF Full Text Request
Related items