Font Size: a A A

News Selection And Classification Based On Triple-play Service

Posted on:2016-10-21Degree:MasterType:Thesis
Country:ChinaCandidate:X R TaoFull Text:PDF
GTID:2348330479954365Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years, with the development of technology and society, triple-play become represent the general trend. However, with more comprehensive, more quickly, more extensive service,the convergence of three networks also brought great security risks. The characteristics of Internet is real-time, convenient and comprehensive, however the information is huge and uneven. Therefore, the information supervision has become an important issue. The news is the main carrier of the information transmission, and news supervision is an important part of the content-supervision. Text-classification and text-clustering is a important technology in the field of text-mining. The news after classify is more manageable and differential. Classification solve the problem of clutter information in a certain extent, and it is the foundation of information-filtering, target-marketing, performance-prediction and medical-diagnosis and so on. So the research on text-classification has important significance.The HUSTRIM system is a content-supervision system based on triple-play. The monitoring and safety management of content can provide a guarantee for the whole network. Selection and classification of news can help the supervision and management of news content. The selection of news use network spider, text extraction method to get used information. News classification used the method of bayes and k-means.The HUSTRIM system include news acquisition module, text extraction module, classification module and clustering module. It get nearly 700 Webpage news from the Internet. With the help of text categorization corpus from Sogou labs, put the 700 Webpage news to different clusters.Finally, by the experiment, obtain the best parameter in the text extraction, bayes classification and k-means clustering.
Keywords/Search Tags:Network spider, Text extraction, Text classification, Na?ve Bayesian classification
PDF Full Text Request
Related items