Font Size: a A A

Research And Application Of Maximum Likelihood Estimator Of Connected Bit Minwise Hash

Posted on:2017-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:X H ShengFull Text:PDF
GTID:2348330488475034Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Network openness and text easy replication provides convenience for academic resources sharing at the same time,also for the project application of repetition,long declaration and so on provides the opportunity to academic misconduct,the state council in 2014 for this article(2014)[11] requirements strengthen rechecking during the course of project,avoid the topic overstating or repeated funding.Paper similarity check to fund projects as the application background,research project similarity examination system involved in the key technology of similarity estimation.Under the environment of mass document data,the precision and efficiency become the restricting factors of mass document similarity detection system can be available.Shorten the operation time of the algorithm,effectively improve the detection accuracy is must consider the elements of similarity estimation algorithm.Therefore,the urgent problem is similarity estimation precision is less than and than such key problems as efficiency bottleneck.The main research content is as follows:1)the maximum likelihood estimator of connected Bit Minwise Hash is proposed.The conclusion is that maximum likelihood estimation for tow sets is the most eclectic and best average accuracy of estimation methods.Experimental results demonstrate that our method just require only 50% of CPU running time of b-bit Minwise Hash,and the precise lost nothing.2)the maximum likelihood estimator of three-way resemblance is provided.In experiment result analysis,Experimental results demonstrate the algorithm's impove 10% accuracy more than Minwise Hash of three-way resemblance;Connected Bit Minwise Hashing for Estimating Three-Way Similarities just require only 50% of CPU running time of b-bit Minwise Hash for the three-way estimation.3)In order to further improve the system performance,the maximum likelihood estimator of connected Bit Minwise Hash is applied to the system,save 60% time of computing;Solved three key problems:the establishment of a smart professional thesaurus and disable thesaurus,enhance the detection accuracy;elastic fine-grained detection mechanism is proposed;the cross sectoral similarity detection module is proposed,enhance the detection effect.
Keywords/Search Tags:Minwise Hash, Maximum likelihood estimator, Three-way resemblance, Tow-way resemblance
PDF Full Text Request
Related items