Font Size: a A A

Design And Implementation Of Multi-classifier Based On Information Classification System

Posted on:2017-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:S ChenFull Text:PDF
GTID:2308330488477156Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With continuous development of the Internet, especially the increasing prosperity of mobile Internet and social networking between people further depth through virtual social software. Since the media is constantly popular in the social scene, which is to be further strengthened, coupled with the arrival of the red network economy, sources of information on the Internet moreover, in accordance with the Community interest or information aggregation. This graphic information propagation efficiency in social networks is even more real-time. The cases of multi-source news and news which is quickly generated news production are no longer major problems, but on the way of the Internet efficient distribution capabilities way, how news from different sources of news gathering and classification will be news media biggest challenge. The context of this article is that presents news classification system to speed up the release of news by news classification system to reduce the amount of processing user information.The main contents are as follows:First, the status of the development of domestic and foreign news classification system launched a detailed study and analysis, furthermore, systems architecture and technology selection article pointed out the direction.Secondly, the system launched a detailed needs analysis and design, the overall design of the system which has been launched, including the functional modules of the system framework and system components. Then the system of four modules: text pre-processing module, Chinese automatic segmentation module, feature extraction modules and multi-classifiers module launched a detailed design.Finally, the system carried out a detailed implementation and testing of four modules news classification system started to realize. Text preprocessing module handle for the Web text, we use the Jsoup, which could be implemented by Jsoup tool for Web text labels filtration, extraction news topics and content. Chinese word is used Jcseg word breaker. Feature extraction is using the mutual information and Chi-test statistical methods combined, the feature value vector dimensionality reduction and for best feature value vector. Multiple classifiers module final calculation NBayes priori probabilities stored in Mysql.
Keywords/Search Tags:Jcseg classifier, classifier, news classification, Naive Bayes, KNN algorithm, Chinese word segment
PDF Full Text Request
Related items