Font Size: a A A

The Application And Research Of Semantic Graph Based On Text Similarity

Posted on:2016-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:M LiFull Text:PDF
GTID:2308330461484239Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advent of big data and the level of education of the era of continuous improvement, the number of duplicate documents are also growing similarity of documents and papers re-check more imminent. Document similarity measurement mainly through conversion, to convert it into distance, angle, or the like to measure the degree of bending, in order to achieve better measurement results. Cosine similarity, a measure commonly used method of document similarity, the degree of similarity can be better response document. However, the number and proportion of its own is not sensitive to the similarity measure which brings a lot of trouble.According to incomplete statistics, in 2008, on the Internet, about 40% of the network resources with other network resources are duplicated. Resources duplicate resources or approximate number of repeats increases the index of search engines, and the search results are generated no small impact. Approximate duplicate detection problem in the field of information retrieval has been a household name. Here I hope to improve the similarity discrimination algorithms and crawlers to achieve improved avoid duplicate resources crawling.Network resources and network diagrams to some extent than plain text messages are much more complex, in a text file, the statement sequence can be converted into meaning, and network resources can not be converted into meaning, ultimately, will have a similar semantic network resources complex sequence of different statements. Similarly, in the approximate duplicate detection in text-based, meaning the content is not much of a problem check the weight, but the semantic network documentation, the issue is outstanding, assuming even a different semantic network documentation, if calculating their deductive closure package is likely to be the same. In the Semantic Web images, in addition to the statement sequence, and the need to verify the empty node. Blank nodes are anonymous URI does not have the resources, and there is no word meaning.1998 proposed the World Wide Web Internet expert Tim Berners-Lee proposed the concept of the Semantic Web, it has now developed into a natural language understanding and cognitive science research in the field of a concept, using it to express complex concepts and those they relationships. Representation of the Semantic Web is a directed graph, where the concept of designated representatives, edges represent semantic relationships between these concepts, so to form a semantic network is described by a diagram consisting of nodes and arcs. Tim Berners-Lee proposed another concept of the Semantic Web, we are now well-known that the World Wide Web (World Wide Web), but its associated semantic network basic theory.Two semantic network documentation or network diagram is similar language to describe their differences in the retrieval, update, version control and other aspects have a significant role. Herein, describe a number of columns of text similarity measure, used to express the relationship between them, and their different metering them.Meanwhile, in order to distinguish the similarity between two semantic network map defines a variable to represent the version between them. The variables were added and deleted its tuple to maintain a balance between the two, and this variable is by judging RDF graph serialization, not just according to the document URI to determine the similarity between the two.Finally, the semantic web graph is verified by experiment similarity measure which has a good experimental effect. However, due to the relatively hasty time, there are some other problems, such as stability, etc.The stability should be improved late.
Keywords/Search Tags:Similarity, cosine similarity, semantic network, Semantic Web Graph, RDF
PDF Full Text Request
Related items