Font Size: a A A

Research On A Compound Keywords Abstraction Based On Small World Network Theory

Posted on:2007-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:L B DongFull Text:PDF
GTID:2178360182477846Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Small World Network characterized by short characteristic path length and high clustering coefficient is widely observed in many real-world networks especially in human language. In this paper, we construct a new kind of algorithm extracting compound keywords from a Chinese document as a small world. Firstly, a Chinese document will be represented by a network: the nodes represent terms, and the edges represent the co-occurrence of terms, which can describe the semantic association relation between single words of the document and will be shown to have the characteristic of being a small world network by some necessary account. Secondly, two variables—the characteristics path length incremental and clustering coefficient incremental—will be introduced so that we can get the candidate keywords set by numbering the terms importance to the semantic of the document. Finally, considering comprehensively many factors, such as the semantic association relation between single words in the candidate set, the relation between two variable incremental 's numerical value as well as special field requirement, we combine some related words in the candidate set and get some compound keywords. In addition, we also make a brief analysis on the space and time complexity of the algorithm. The result of experiments shows that the algorithm is effective and accurate, the accuracy of which is as high as 91%, compared with the keywords by an artificial abstraction from the same document. The semantic represented by the compound keywords to a document, using this algorithm, is far more clearer and accurate than that of single keywords set, which helps us have a better understand of the document semantic in high level.
Keywords/Search Tags:SWN(Small World Network), Document's semantic structure figue, The characteristic path length incremental, The cluster coefficient incremental, Compound Keywords
PDF Full Text Request
Related items