Font Size: a A A

Research On Similitary Page In Internet Based On Semantic

Posted on:2012-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:R J LiFull Text:PDF
GTID:2248330374980961Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Modern computer network technology develops rapidly, their consumers increase at therate of its double times,always search information like that: enter into key words in the searchengine, and then the computer network will automatically offer pages that contain key words.However, there are a large number of reprint and repetitive information existing inconsumer’s search engine. In order to find their needed important materials, they always lookthrough these pages one by one. Follow this; the consumer had to waste a lot of time andenergy on these reprint and repetitive information in that this kind of information floodedtheir needed materials. And this paper focus on study how to find the consumer’s neededinformation in pages quickly and accurately. And our suggestion is adding semantic meaningof words to the similarity judgment of pages so as to let computer network realize consumer’sneed and better their service. The author come up with a method that firstly base on semanticdictionary “Hownet”, and then combine with word specificity-IDF, as last make a similarityjudgment for pages’ text.On the basis of study of webpage similarity calculation method of semantic meaning ofwords, the paper had finished following work:Firstly,puts forward that makes a pre-treatment for pages’ text which includes removalof noise, Chinese segment and old words, and then sort out these pages so as to decreaseaccuracy of pages’ similarity judgment in that similar pages in general in the same sorts, andat last the author will draws out synonyms in “synonym dictionary” and then replaces thosesimilar sorts key words so as to enhance accuracy of judgment.Secondly, the author adopts Semantic Resource “Hownet” and takes advantage ofKnowledge Network to calculate similarity of vocabulary. In the Knowledge Network, thesemantic knowledge of vocabulary is defined as meaning, and meaning be defined by anotherlanguage is called original meaning. The original meaning is the littlest meaning unit when itis applied in describes a concept.The author suggested that firstly we find out thecorresponding meaning of words in knowledge network, but there are several meanings in aword, therefore the similarity between words is similarity maximum of meaning. Follow this;the author concludes that similarity of words is equal to similarity of original meanings.Based on this conclusion, we also bring in word specificity IDF, when we search something indocument, the IDF will show its foremost proportion in the information search tool. Therefore, the computation method of combining semantic resource with word specificity IDF willachieve a better effect.Finally,According to equipment, the researcher had tested the computation method ofcombining semantic resource with word specificity, which proves its relatively satisfactoryresults and particularly in the aspect of accuracy rate and recall rate.
Keywords/Search Tags:Semantic, Hownet, Similarity of Pages, Word Specificity
PDF Full Text Request
Related items