A Similarity Detection System Of Network News

Posted on:2012-10-01

Degree:Master

Type:Thesis

Country:China

Candidate:H Liang

Full Text:PDF

GTID:2178330332999667

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In the era of information explosion, similarity detection has become a highlyimportant issue; either paper plagiarism detection or retrieval of documents with similarcontent requires the support of this technology. In recent years, with the development ofthe document segmentation techniques, modeling techniques, similarity calculationmethods, the study of similarity detection methods has had much more development andresearch results. However, so far, none of those algorithms has matched up our ordersabouttheefficiencyofthealgorithmandtheresultsofimplement exactly.The scope of this study is the similarity detection of network news; this is a verypractical significance. Today, the knowledge economy and the Internet wave sweepingthe globe, we find in human history there is never a thing that can like the Internet nowaffect on people's work and lives so great. China is also influenced by the tide in recentyears, the grow speed of Internet users and the absolute number of Internet users are theworld's number one, meanwhile, compared to the developed countries, our developmentof network have obvious gap, the most direct manifestation of this is the low bandwidthand speed. We found that this is not only the result of hardware, but also have directlyrelated to the similarity of network news. According to the survey a large part of theInternet users simply browse network news, but we know the similarity rate of networknews is unusually high, the same news often have hundreds of different links. Today'snetwork news contains more and more information, so similarity of network news costmany resources of network. Meanwhile our people's information needs become muchmore specifically; the meaningless links cost our limited network resource, so in thishuge repository of information retrieving useful information is very essential to reducenetworkresourcecost andthetimepeoplesurfnetwork.Similaritydetectionofthenetworknews has beenadifficultproblem foralongtime,thedifferencebetweenit andthesinglewordorsentencesimilaritydetection is veryclear.Asingle word and sentence can use a fixed algorithm to realize the similarity detection,the detectionalgorithm efficiencyand theresults ofimplement alsocan be good. But,theinformation contains in network news is very rich, which leads to the complexity ofdetection, the combination of these complex objects also leads to a number of uncertainfactors. If we do not consider this, the results of similarity detection can not be satisfied. The similarity calculation of single words and sentences can be used as the basis ofcalculating the similarity detection of network news, but it can not be used as the onlyfactortoaccuratelycalculatethesimilarityofnetworknews.This article first reviews the relevant technology of similarity detection, introducesthe development of the present similarity detection generally. Network news has thetypical characteristics of Chinese documents, and these technologies are the basis ofsimilaritydetectionofnetworknews. Inthis article, Iprovidedetailedanalysis withsomecharacteristics of the network news documents, then select similarity detection methodsby analyzing the properties of network news, finally finish this similarity detectionsystem of network news. In this paper, I introduce the design process detailed with thefollowing aspects: the system requirements, system design, system implementation,system test, and the system use. As a result, the detail introduction makes the readers ofthis paper have a comprehensive understanding about network news similarity detectionsystem.

Keywords/Search Tags:

NetworkNews, Detection, Similarity

PDF Full Text Request

Related items

1	Design And Implement Of Dulplicate Document Detection Based On Similarity Estimation
2	Research On Code Similarity Detection Technology Based On Local Sensitivity Hash
3	Research On Similarity Judgment And Detection On Face Images Based On Lightweight Network
4	Research On SQL Code Similarity Detection Algorithm
5	Research On DDoS Attack Detection Based On Traffic Similarity
6	An Automatic Similarity Detection Engine Between Sacred Texts Using Text Mining and Similarity Measure
7	Reserch And Application On Document Similarity Detection Based On Minwise Hashing
8	Design And Implementation Of IDS Running Anomaly Detection System Based On Log Similarity
9	A Similarity Evaluation Algorithm Of C Source Program Based On Code Fingerprint
10	A Study On "Red Chili Pepper Comments"of"Red Networks"