Font Size: a A A

Research On Keywords Acquisition Based On Semantic Distance From Web Pages

Posted on:2012-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:A P ShiFull Text:PDF
GTID:2178330338494861Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the remarkable process of Information Technology and the popularization of the Internet, information on the Web is emerging exponentially. How to effectively seek and manage information becomes an important research issue. Keywords, as a brief summary of a document, provide a solution to help organize, manage and retrieve documents, and are widely used in information retrieval and digital libraries.However, most articles on the web usually do not have human-assigned keywords. Considering that manual assignment is time-consuming, a accurate and easy method for keywords acquisition becomes a requirement. To this problem, this paper launches the research, which is based on the semantic relevancy between words. And through the calculation of semantic distance, an approach to automatically generate keywords was realized. So we can deal with large mounts of texts and get the correspondent keywords easily and quickly.In the study of this topic, we launch this statement around with how to obtain keywords from English news reports; we create a Keywords Acquisition System based on Semantic Distance. We divide it into two parts to implement it. Firstly, the calculation of the semantic distance between words. We measure the semantic relevancy in two ways: in the context of given documents and the word senses. Secondly, the thinking of semantic clustering. We divide the content of the given document based on the measure of the semantic distance between words, then to summarize the cluster and generate the keywords for the document in the end. At the same time, we use English news reports as text corpus, to demonstrate the acquisition process and give corresponding experimental data. The experiment shows that as an unsupervised method, keywords acquisition based on semantic distance is with simple model and realized easily, and is an convenient and effective way to get keywords from the given text.
Keywords/Search Tags:Semantic Distance, Keywords Acquisition, word co-occurrence, NGD (Normalized Google Distance), clustering
PDF Full Text Request
Related items