Proteins of the same class often share a secondary structure packing arrangement but differ in how the secondary structure units are ordered in the sequence. We find that proteins that share a common core also share local sequence-structure similarities, and these can be exploited to align structures with different topologies. In this study, segments from a library of local sequence-structure alignments were assembled hierarchically, enforcing the compactness and conserved inter-residue contacts but not sequential ordering. Previous structure-based alignment methods often ignore sequence similarity, local structural equivalence, and compactness. The new program, SCALI (Structural Core ALIgnment, at http://www.bioinfo.rpi.edu/∼yuanx2/scali.html), can efficiently find conserved packing arrangements, even if they are non-sequentially ordered in space. SCALI alignments optimize the sequence and structure similarity, and they conserve remote sequence similarity and contain fewer alignment errors. Clustering of our pairwise non-sequential alignments shows that recurrent packing arrangements exist in topologically different structures. These clustered common cores represent a new level of structure classification, which is more general than topology but more specific than architecture. Based on the identified conserved core structures, the self-avoiding hidden Markov models (SCALI-HMM) are being developed, it would be used for protein structure prediction, protein design and "new fold" discovery. The modified self-avoiding HMM algorithms might be utilized as new modeling tools in many other fields. |