Elena Marchiori*, Walter Pirovano, Jaap Heringa and Anton Feenstra*
* equally contributed

A Feature Selection Algorithm for Detecting Subtype Specific Functional Sites from Protein Sequences for SMAD Receptor Binding

Multiple sequence alignments are often used to reveal functionally important residues within a protein family. In particular they can be very useful for identification of key residues that determine functional differences between protein subclasses (subtype specific sites). This paper proposes a new algorithm for selecting subtype specific sites from a set of aligned protein sequences. The algorithm combines a feature selection technique with neighbour position information for selecting and ranking a set of putative relevant sites. The algorithm is applied to a dataset of protein sequences from the MH2 domain of the SMAD family of transcription factors. Validation of the results on the basis of the known interaction and function of the sites shows that the algorithm successfully identifies the known (from literature) subtype specific sites and new putative ones.