Font Size: a A A

Study On Text Clustering Based On Concept Semantic Tree

Posted on:2009-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:S DaiFull Text:PDF
GTID:2178360245953593Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of network, the demanding of searching for text has been risen. Intelligent search, a new technology improved on classic keywords matching, has become a hotspot. It will be the core technology of the next network and one of the important directions of it applies sematic counting technology to the searching for text. The glossary sematic counting technology based on worknet reveals the sematic information among the glossary. It calculates the relativity of two papers by the description forms of text vector space.Clustering analysis is an important means of data mining, it is important in text mining. Text clustering is actually the clustering of text contents. (For example: the muti-file essay system of Biya university.) In the classical vector space model(VSM) based on text keywords, document vector Di={d1i,d2i,…, dmi} was composed of m keywords to state one document of the document set. But there are problems in this method. First, it takes the words as independent elements and there are no relationships between them when calculating the similarity of text vector spaces by inner product of vector. It can't clearly express the semantic meaning of the text. Second, the semantic VSM just matches the explicit words appeard in the texts, ignoring multiple meanings of a word and various expressions of text semantics.The set of vocabulary entries can't exactly reflect the semantics of texts. But it can cluster the semantics of the texts by changing the method of text clustering. The semantic tree consists of contents of How Net to eliminate the ambiguities of words and cluster semantic similar documents based on clustering of contents.In this paper, semantic trees which have accession and deletion are established based on Hownet to implement granularity concept matching. The concept semantic tree matching arithmetic is presented. The efficiency of arithmetic is proved by the result of the experiment and the problem of "key word obstacle" and semantic ambiguity can be solved much better by the arithmetic. The recall radio is improved.
Keywords/Search Tags:Semantic relevancy, Concept matching, Sematic tree, How net, Semantic similarity
PDF Full Text Request
Related items