Font Size: a A A

Research On Information Gain Based Software Birthmark

Posted on:2015-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:F F LiFull Text:PDF
GTID:2428330491952723Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The software product is a kind of digital products with the characters of being easy to be copied,tampered and spread,which brings tremendous economic interests for software piracy.As software piracy is seriously impaired the development of software industry,software copyright protection has become a hot area of both software manufacturer and academia.Software birthmark technology is one of the most important technologies of copyright protection it can extract inherent features from a software which can uniquely identify the software.In this paper,the related work of software birthmark is elaborated,then,based on the existing related technologies,it introduces feature selection in text categorization from information retrieval and proposes a new feature selection method----Information Gain,to solve the problem that on K-gram based static software birthmark is poor of resilience.At last,it solves Information Gain's inherent problems.All contents in this paper are as follows:(1)It constructs a set of software programs which consists of piracy software and legitimate software.Firstly,use k-gram algorithm to extract features from every software program by static analysis to obtain a set of feature fragments.Secondly,select more useful feature fragment by InformationGain to remove redundancy and improve the performance.(2)The frequency difference of fragments selected by Information Gain is very big,therefore,as to a fragment,the higher the value of Information Gain is,the better the fragment's representation is.But this method overlooks the frequency of fragment in every software program,which cloud influence the results.The features selected by Information Gain may unevenly distribute in two program sets.To solve the above problems,it adds the impact factor of feature frequency in a class to improve the weakness.(3)The experimental evaluation that contains some experiments of both creditability and resilience is made for proposed birthmark by comparing with two classical software birthmarks,WPP and TaNaMM.The result shows that the Information Gain based software birthmark can not only keep the high creditability,but also improve obviously the performance of resilience,so it can detect piracy efficiently.
Keywords/Search Tags:software copyright protection, feature extracting, k-gram algorithm, feature selection, information gain, feature frequency
PDF Full Text Request
Related items