This thesis describes the design and implementation of a new framework PERF for deduplicating protein mentions using of a wide range of protein attributes. A mention refers to any recorded information about a protein, whether it is derived from a database, a high-throughput study, or literature text mining, among others. This framework is easily extendable to the deduplication of protein-protein interactions (PPIs). PERF translates mentions into instances of a Framework XML schema to facilitate mention comparisons. This framework also uses "virtual attribute dependencies" to "enhance" mentions with additional attribute values. PERF computes a likelihood measure based upon the textual similarity of mention attributes, and the overlap between protein classes implied by each mention's non-sequence attributes. A prototype of the framework was implemented, and preliminary tests indicate that the framework can clearly separate duplicate mentions from non-duplicate mentions. |