Font Size: a A A

A Similarity Detection System Of Network News

Posted on:2012-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:H LiangFull Text:PDF
GTID:2178330332999667Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the era of information explosion, similarity detection has become a highlyimportant issue; either paper plagiarism detection or retrieval of documents with similarcontent requires the support of this technology. In recent years, with the development ofthe document segmentation techniques, modeling techniques, similarity calculationmethods, the study of similarity detection methods has had much more development andresearch results. However, so far, none of those algorithms has matched up our ordersabouttheefficiencyofthealgorithmandtheresultsofimplement exactly.The scope of this study is the similarity detection of network news; this is a verypractical significance. Today, the knowledge economy and the Internet wave sweepingthe globe, we find in human history there is never a thing that can like the Internet nowaffect on people's work and lives so great. China is also influenced by the tide in recentyears, the grow speed of Internet users and the absolute number of Internet users are theworld's number one, meanwhile, compared to the developed countries, our developmentof network have obvious gap, the most direct manifestation of this is the low bandwidthand speed. We found that this is not only the result of hardware, but also have directlyrelated to the similarity of network news. According to the survey a large part of theInternet users simply browse network news, but we know the similarity rate of networknews is unusually high, the same news often have hundreds of different links. Today'snetwork news contains more and more information, so similarity of network news costmany resources of network. Meanwhile our people's information needs become muchmore specifically; the meaningless links cost our limited network resource, so in thishuge repository of information retrieving useful information is very essential to reducenetworkresourcecost andthetimepeoplesurfnetwork.Similaritydetectionofthenetworknews has beenadifficultproblem foralongtime,thedifferencebetweenit andthesinglewordorsentencesimilaritydetection is veryclear.Asingle word and sentence can use a fixed algorithm to realize the similarity detection,the detectionalgorithm efficiencyand theresults ofimplement alsocan be good. But,theinformation contains in network news is very rich, which leads to the complexity ofdetection, the combination of these complex objects also leads to a number of uncertainfactors. If we do not consider this, the results of similarity detection can not be satisfied. The similarity calculation of single words and sentences can be used as the basis ofcalculating the similarity detection of network news, but it can not be used as the onlyfactortoaccuratelycalculatethesimilarityofnetworknews.This article first reviews the relevant technology of similarity detection, introducesthe development of the present similarity detection generally. Network news has thetypical characteristics of Chinese documents, and these technologies are the basis ofsimilaritydetectionofnetworknews. Inthis article, Iprovidedetailedanalysis withsomecharacteristics of the network news documents, then select similarity detection methodsby analyzing the properties of network news, finally finish this similarity detectionsystem of network news. In this paper, I introduce the design process detailed with thefollowing aspects: the system requirements, system design, system implementation,system test, and the system use. As a result, the detail introduction makes the readers ofthis paper have a comprehensive understanding about network news similarity detectionsystem.
Keywords/Search Tags:NetworkNews, Detection, Similarity
PDF Full Text Request
Related items