Font Size: a A A

Study And Implementation On Clustering Analysis Of Ocean Documents Based On Self-Organizing Feature Map

Posted on:2010-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:W P ZhaoFull Text:PDF
GTID:2178360275485967Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Along with the implementation of national sea strategy, the number of sea related Web literature grows rapidly. Doing clustering analysis on the sea literature is helpful for the sea information mining, which has the vital significance regarding the sea science and technology.Clustering analysis of Chinese documents involves many steps, including the database documents extraction, the documents Chinese participle, the construction of document-set expression model, document set-based clustering analysis, etc. Different with English document processing, Chinese documents processing must carry on the participle first. The common participle methods include the character string matching-based, the understanding-based and the statistics-based .Now, many participle methods have met the actual requirements, the focus is basically on how to select the appropriate participle software. In the information retrieval domain, vector space model is generally used as the expression model of document set, from which the correlation degree between documents can be easily calculated, thus the model can be adopted in document clustering analysis. There are many clustering algorithms, like the division-based, the level-based, the density-based and so on, and the algorithm choice is decided by the application goal.In order to construct the sea literature clustering system based on self-organized feature mapping (SOM) neural network , this thesis analyzed the commonly used Chinese participle methods, studied the document-set expression model as well as various clustering algorithms, designed and realized one SOM neural network-based document clustering analysis system OCA, and the prime task and innovation are as follows:1. Based on analyzing and comparing various clustering algorithms, the SOM neural network was chosen as the sea literature clustering analysis algorithm. Here the SOM neural network uses the chef hat-style winner neighborhood, among which the neurons adjust their weights.2. Chinese participle technologies were studied, various participle methods were compared with each other, and software MMSEG which has high accuracy rate of participle was chosen to realize the participle of Chinese sea literature.3. The vector space model was used to express document set, and the widely accepted TFIDF was used to denote the contribution of glossary to the document semantics.4. By using java under the Eclipse environment, a SOM-based sea literature clustering system OCA was developed. Some sea literature were downloaded from CNKI, and processed with the OCA system .The experiments indicated this system may carry on the effective clustering analysis to the sea literature.
Keywords/Search Tags:Ocean Documents, Clustering Analysis, Vector Space Model, Self-Organizing Feature Map
PDF Full Text Request
Related items