Word relatedness is the measurement of the relationship between two terms and the research of word relativity is a basic research topic in the fields of nature language processing. The determination of relevance of any given word combination is a profound issue in many applications of nature language processing (NLP), such as document clustering, word sense disambiguation, semantic web, information retrieval. But the method used to compute word semantic relatedness is just to use a dictionary or corpus and don't combine both of them. In this paper, a new method is proposed to compute the relativity of terms in the news corpus which is base on the concept network.There are two main methods to compute word relatedness, one is to count the co-occurrence use the corpus; the drawbacks of this method is that the statistic can't show the inherent relationship between words. The other method is to use a dictionary or domain ontology. The commonly appreciated ontology delicately created by the experts and the relationship in the ontology is subjective based on personal understanding and also insensitive meaning that it can't evolve with time and is hard to absorb new words. The present word relatedness research pays close attention to single words pair and ignores the relation between words.In this paper, we propose solutions to solve the above problems. Firstly, we construct a news corpus, and make use of the feature of news to count the word's co-occurrence. Secondly we introduce the Wikipedia to void the aforementioned drawbacks caused by statistics of the corpus. Then we construct a concept network using the result of SWRN-W (single word relatedness computation algorithm for news corpus based Wikipedia). We compute the word relatedness using the weight of path in the network which could overcome the drawback of word isolated.The experimental results have demonstrated the advantage and validity of our proposed methods. And the method we proposed will give a good exploration to the research of word relatedness. |