Font Size: a A A

Design And Implementation Of Chinese Science And Technology Literature Keywords Clustering System Under The Background Of Mass Information

Posted on:2013-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:F GaoFull Text:PDF
GTID:2248330371466586Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Since ancient times, the information are very important for the production, life and other aspects of human, especially for us in the information age. With the rapid development of Internet technology, the Internet has become the main channel for us to obtain various kinds of information, but due to the exponential increase of the amount of information in the Internet every day and all kinds of information are intertwined, in this case, how to obtain useful information accurately has become the research focus.Clustering analysis is an important technology of natural language processing technology and it is a important method to mine the effective information which hidden in the massive information. In scientific research, on one hand, the number of all kinds of paper is too huge to read all of them; on the other hand, extensive use of search engine technology also provide huge a lot of words help us to find all kinds of information, so how to find useful information through the clustering based on various existing words has becomes a meaningful topic.First, this paper analyzes the dilemmas of scientific research literature search technology can not meet the needs of practical application under the background of information explosion, and then the paper introduce the technology of literature search, especially study the crawler which is the core module of literature search, then we conducted a comprehensive and detailed study of focused Web Crawler which is the current research focus of web crawler, including the system structure, key technology. Secondly, this paper briefly introduces the clustering technology of the natural language processing, on this basis, this paper introduces word clustering technique, conceptual clustering technique. Based on the analysis of current word clustering technique for careful, because excessive dimensions of clustering space often lead to high complexity of clustering, this paper use the clustering method based on atomic-concepts to reduce the clustering complexity, and the ultimate aim is to combine the web crawler technology and word clustering technique based on atomic concepts to solve the difficult problem that it is hard to find the scientific research hotspot due to the information explosion under the background of massive information. Finally, base on depth study of the theory, on one hand, this paper designed and implemented the web crawler to get specified data from the specified website, on the other hand, this paper makes full use of Chinese word clustering technique of natural language processing technology to implement Chinese words cluster system which based on atomic concepts by means of FCM algorithm of MATLAB, then analyzed the experiment results and the results achieved the desired results basically.
Keywords/Search Tags:Crawler, Atomic-words, Atomic-concepts, Words cluster, Algorithm of FCM
PDF Full Text Request
Related items