ABSTRACT
A number of medically important disease-causing bacteria (collectively called Gram-negative bacteria) are noted for the extra "outer" membrane that surrounds their cell. Proteins resident in this membrane (outer membrane proteins, or OMPs) are of primary research interest for antibiotic and vaccine drug design as they are on the surface of the bacteria and so are the most accessible targets to develop new drugs against. With the development of genome sequencing technology and bioinformatics, biologists can now deduce all the proteins that are likely produced in a given bacteria and have attempted to classify where proteins are located in a bacterial cell. However such protein localization programs are currently least accurate when predicting OMPs, and so there is a current need for the development of a better OMP classifier. Data mining research suggests that the use of frequent patterns has good performance in aiding the development of accurate and efficient classification algorithms. In this paper, we present two methods to identify OMPs based on frequent subsequences and test them on all Gram-negative bacterial proteins whose localizations have been determined by biological experiments. One classifier follows an association rule approach, while the other is based on support vector machines (SVMs). We compare the proposed methods with the state-of-the-art methods in the biological domain. The results demonstrate that our methods are better both in terms of accurately identifying OMPs and providing biological insights that increase our understanding of the structures and functions of these important proteins.
- Ali K., Manganaris S. and Srikant R., Partial classification using association rules, KDD'97, p. 115--118, 1997.Google Scholar
- Boeckmann B., Bairoch A., Apweiler R., Blatter M.-C., Estreicher A., Gasteiger E., Martin M. J., Michoud K., O'Donovan C., Phan I., Pilbout S., Schneider M. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003, Nucleic Acids Res. 31:365--370, 2003.Google ScholarCross Ref
- Deshpande M. and Karypis G., Evaluation of Techniques for Classifying Biological Sequences, PAKDD'02, 2002. Google ScholarDigital Library
- Diederichs K., Freigang J., Umhau S., Zeth K. and Breed J., Prediction by a neural network of outer membrane β-strand topology, Protein Science, 7, p. 2413--2420, 1998.Google ScholarCross Ref
- Eisenhaber F. and Bork P., Wanted: subcellular localization of proteins based on sequences, Trends in Cell Biology, 8, p. 169--170, 1998.Google ScholarCross Ref
- Hua S. and Sun Z., Support vector machine approach for protein subcellular localization prediction, Bioinformatics, 17(8), p. 721--728, 2001.Google ScholarCross Ref
- Hui L., Color Set Size Problem with Applications to String Matching, Combinatorial String Matching, Lecture Notes in Computer Science, 644, p. 230--243, Springer-Verlag, 1992. Google ScholarDigital Library
- Jacoboni I., Martelli P., Fariselli P., De Pinto V. and Casadio R., Prediction of the transmembrane regions of β-barrel membrane proteins with a neural network-based predictor, Protein Science, 10, p. 779--787, 2001.Google ScholarCross Ref
- Joachims T., Learning to Classify Text Using Support Vector Machines. Dissertation, Kluwer, 2002. software downloadable at http://svmlight.joachims.org/ Google ScholarDigital Library
- Landau G. and Vishkin U., Fast Parallel and Serial Approximate String Matching, Journal of Algorithms, 10(2):157--169, 1989. Google ScholarDigital Library
- Lesh N., Zaki M. J. and Ogihara M., Mining Features for Sequence Classification, 5th ACM SIGKDD, 1999. Google ScholarDigital Library
- Leslie C., Eskin E. and Noble W., The spectrum kernel: A string kernel for SVM protein classification, Proceedings of the Pacific Symposium on Biocomputing, p. 564--575, 2002.Google Scholar
- Liu B., Hsu W. and Ma Y., Integrating classification and association rule mining, KDD'98, New York, NY, 1998.Google Scholar
- Martelli P., Fariselli P., Krogh A. and Casadio R., A sequence-profile-based HMM for predicting and discrimating β barrel membrane proteins, Bioinformatics, 18(1) 2002, S46-S53, 2002.Google ScholarCross Ref
- Nakashima H. and Nishikawa K., Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies, Journal of Molecular Biology, 238, p. 54--61, 1994.Google ScholarCross Ref
- Quinlan J., C4.5: programs for machine learning, Morgan Kaufmann Publishers, 1993. Google ScholarDigital Library
- Reinhardt A. and Hubbard T., Using neural networks for prediction of the subcellular location of proteins, Nucleic Acids Research, 26(9), p. 2230--2236, 1998.Google ScholarCross Ref
- Rulequest Research, Information on See5/C5.0, at http://www.rulequest.com/see5-info.htmlGoogle Scholar
- Schirmer T. and Cowan S., Prediction of membrane-spanning β-strands and its application to maltoporin, Protein Science, 2, p. 1361--1363, 1993.Google ScholarCross Ref
- Schulz G., β-barrel membrane proteins, Curr. Opin. Struct. Biology, 10, p. 443--447, 2000.Google ScholarCross Ref
- Vapnik V., The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995. Google ScholarDigital Library
- Vert J.-P., Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings, Proceedings of the Pacific Symposium on Biocomputing, p. 649--660, 2002.Google Scholar
- Wang J., Chirn G., Marr T., Shapiro B., Shasha D. and Zhang K., Combinatorial Pattern Discovery for Scientific Data: Some Preliminary Results, SIGMOD-94, Minnesota, USA, 1994. Google ScholarDigital Library
- Wang K., Zhou S. and He Y., Growing Decision Tree on Support-less Association Rules, KDD'00, Boston, MA, USA, 2000. Google ScholarDigital Library
- Wimley W., Toward genomic identification of β-barrel membrane proteins: Composition and architecture of known structures, Protein Science, 11, p. 301--312, 2002.Google ScholarCross Ref
- Yang M.-H., Roth D. and Ahuja N.: A Tale of Two Classifiers: SNoW vs. SVM in Visual Recognition. ECCV (4): 685--699, 2002 Google ScholarDigital Library
- Yuan Z., Prediction of protein subcellular locations using Markov chain models, FEBS Lett., 451, p. 23--26, 1999.Google ScholarCross Ref
- Zhai Y. and Saier M., The β-barrel finder (BBF) program, allowing identification of outer membrane β-barrel proteins encoded within prokaryotic genomes, Protein Science, 11, p. 2196--2207, 2002.Google ScholarCross Ref
Index Terms
- Frequent-subsequence-based prediction of outer membrane proteins
Recommendations
Application of residue distribution along the sequence for discriminating outer membrane proteins
Discriminating outer membrane proteins from other folding types of globular and membrane proteins is an important problem both for detecting outer membrane proteins from genomic sequences and for the successful prediction of their secondary and tertiary ...
Discrimination of outer membrane proteins using support vector machines
Motivation: Discriminating outer membrane proteins from other folding types of globular and membrane proteins is an important task both for dissecting outer membrane proteins (OMPs) from genomic sequences and for the successful prediction of their ...
Prediction of Outer Membrane Proteins by Support Vector Machines Using Combinations of Gapped Amino Acid Pair Compositions
BIBE '05: Proceedings of the Fifth IEEE Symposium on Bioinformatics and BioengineeringDiscriminating outer membrane proteins from proteins with other subcellular localizations and with other folding classes are both important to predict further their functions and structures. In this paper, we propose a method for discriminating outer ...
Comments