Font Size: a A A

Research And Implementation Of The Internet Hot Topic Clustering

Posted on:2017-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:A K YangFull Text:PDF
GTID:2308330482480994Subject:Communications and IT
Abstract/Summary:PDF Full Text Request
With the continuous development of Internet technology, computer network has brought us a variety of information resources, but the difficulty of obtaining information is constantly increasing. The reason of the current information is increasing, which is caused by the two features of the network information:(1) the increasing scale of network information.(2) The disordered structure of network information.Therefore, the rapid and convenient access to the relevant information from the Internet technology will help people get rid of the dilemma.At present, search engine has become one of the main sources of information, but it is generally used to search for information, this method returns the query results, often depends on the keyword query, only the information contained in this keyword, search engine to return the structure contains this kind of information, so the search information is likely to be independent of the content of this article. Moreover, it is only a list of search results is presented, did not show the relevance of its content, it is difficult to make a full range of information on these news.Although the news media in the news report is a special subject, can let people very clearly understand the cause of the whole news events and results, and to meet the needs of readers. However, the news reports are all professional news media professionals to carry out professional processing and processing, and it is concluded through artificial classification together. The purpose of the hot news event is to learn from the text mining technology, text classification and clustering technology to realize the automatic summarization and classification of news reports, and generate the special report automatically to deal with the needs of people’s search for news consultation.To solve the above problems, this paper designs a search model based on Internet hot topic and implementation plan, the specific design is as follows: firstly, the focused crawler is designed to crawl the web resources. Then, the Chinese word segmentation, feature extraction, weight calculation, and a series of preprocessing work, the construction of document vector space model. Finally, using the technology of topic detection and tracking, the characteristics of Internet information to the topic detection and tracking design. Through testing and comparing various text similarity algorithm and text clustering algorithm, select the appropriate algorithm and then get the hot topic of the internet.The verification test shows that the design of this paper can be accomplished by the experimental test and display, and can automatically discover the hot topics in the Internet.
Keywords/Search Tags:Topic detection and tracking(TDT), Document clustering analysis, Natural language processing, Web crawler
PDF Full Text Request
Related items