Font Size: a A A

Research On The Text Copy Detection Based On Meta-scarch Engine

Posted on:2015-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:P J WangFull Text:PDF
GTID:2298330431492663Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Under the background of big data on the Internet and the rapid development of technologies in computer, communication and network, the Internet has become an important way for people to obtain information. The quantity and value of the information data on the Internet grows exponentially, the phenomenon of copying text on the Internet has became more and more, so there are many duplicate text on the Internet. Too much text on the Internet not only is a waste of storage space on the Internet and effect retrieval efficiency, but also conducive to electronic Intellectual Property protection, how to detect quickly whether the text is copied from the Internet text has become an urgent problem.Based on analysis and studies on the existing text copy detection methods, we propose a text copy detection method based on meta-search engine, work included in this paper:First of all, study the copy the existing the methods of text detection including the basic principles of the text copy detection method typical text copy detection system, a universal basic process on the text copy, and the key technologies and the and key issues were analyzed, including the Chinese text preprocessing techniques, Chinese automatic word segmentation techniques, POS tagging, selection strategy of the text block, text feature, these studies and researches provides a key problem solution ideas for the design of a new text copy detection system.Then, combining the current characteristics of behavior that copying text from Internet, a text copy detection method based on meta-search engine is proposed. To improve the efficiency, use the TF-ISF algorithm that Combine speech information to extracts the core sentence from the text, and use the key words of the core sentence to request meta-search engine; Because traditional sentence similarity calculation method based on LCS ignores the effect of number of the affect, we propose a LCS sentence similarity computing methods Combine the longest common subsequence number. Lastly, use the SOGOU-T data sets to verification the accuracy and efficiency of the improved TF-ISF method and copy detection method based on meta-search engine.
Keywords/Search Tags:text copy detection, text preprocessing techniques, Chinese automaticword segmentation techniques, meta-search engine, TF-ISF, LCS
PDF Full Text Request
Related items