Font Size: a A A

The Text Similarity Study Base On Hownet

Posted on:2013-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:P LiFull Text:PDF
GTID:2268330392965637Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In this thesis, text similarity base on Hownet was studied. Hownet is this thesis’s study object and it have been studied deeply on its concept, special documents and structure. Firstly, the concept of Hownet was declared and found out the special documents:whole and glossary. Then this thesis analysis the sememe what is the special object of Hownet and what is the studies based on. Secondly,this study analysis the procedure of similarity calculation on Text.It is a procedure that from small scope’s similarity calculate to large scope’s similarity calculate. Third,we analyze the short points on all kinds of existing algorithms. Base on that,this thesis bring up with all kinds of improves. Finally,base on the targeted experiments,we analyze the performance of the new method of this thesis.The good effect of this new method has been proved.To be more specific,the main work and result in this thesis were as below:(1)Studing the Hownet’s structure is an important research object of the thesis. Especially,we studied the unique object of Hownet--sememe.Declared the forest structure of sememes in Hownet.This part of study is so important that the followed parts almost all base on it.(2)We make the sememe’s depth in its similarity calculate. At the same time,we think about the inhibited factor of major sememe on other subordinate sememes. Base on the word similarity calculate, we study the similarity about sequences and analyze its rationalities. This part is mainly depends on the Hownet to realize the cut words on sequences and sort its results depends on the part of speech. And we depend on this make the similarity calculate between sequence and sequence come true. Finally,we integrate the results. The results of this part’s experiments display its advantages on no matter recall or precise. And then we prove the methods of this thesis are effective.(3) On the stage of the paragraph or the text, we cut the sequences according to punctuations from text. We realize the sequences similarity calculation according to the work that we have done and we show the main idea of this thesis again. Finally,we proved the method’s valid point via the compare of all kinds of experiments and we find it compare with other classics algorithm s has a improve on no matter recall or precise range1%to20%.
Keywords/Search Tags:Hownet, Sememe, Tree Structure, Similarity, Text
PDF Full Text Request
Related items