skip to main content
10.1145/956750.956800acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Frequent-subsequence-based prediction of outer membrane proteins

Published:24 August 2003Publication History

ABSTRACT

A number of medically important disease-causing bacteria (collectively called Gram-negative bacteria) are noted for the extra "outer" membrane that surrounds their cell. Proteins resident in this membrane (outer membrane proteins, or OMPs) are of primary research interest for antibiotic and vaccine drug design as they are on the surface of the bacteria and so are the most accessible targets to develop new drugs against. With the development of genome sequencing technology and bioinformatics, biologists can now deduce all the proteins that are likely produced in a given bacteria and have attempted to classify where proteins are located in a bacterial cell. However such protein localization programs are currently least accurate when predicting OMPs, and so there is a current need for the development of a better OMP classifier. Data mining research suggests that the use of frequent patterns has good performance in aiding the development of accurate and efficient classification algorithms. In this paper, we present two methods to identify OMPs based on frequent subsequences and test them on all Gram-negative bacterial proteins whose localizations have been determined by biological experiments. One classifier follows an association rule approach, while the other is based on support vector machines (SVMs). We compare the proposed methods with the state-of-the-art methods in the biological domain. The results demonstrate that our methods are better both in terms of accurately identifying OMPs and providing biological insights that increase our understanding of the structures and functions of these important proteins.

References

  1. Ali K., Manganaris S. and Srikant R., Partial classification using association rules, KDD'97, p. 115--118, 1997.Google ScholarGoogle Scholar
  2. Boeckmann B., Bairoch A., Apweiler R., Blatter M.-C., Estreicher A., Gasteiger E., Martin M. J., Michoud K., O'Donovan C., Phan I., Pilbout S., Schneider M. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003, Nucleic Acids Res. 31:365--370, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  3. Deshpande M. and Karypis G., Evaluation of Techniques for Classifying Biological Sequences, PAKDD'02, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Diederichs K., Freigang J., Umhau S., Zeth K. and Breed J., Prediction by a neural network of outer membrane β-strand topology, Protein Science, 7, p. 2413--2420, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  5. Eisenhaber F. and Bork P., Wanted: subcellular localization of proteins based on sequences, Trends in Cell Biology, 8, p. 169--170, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  6. Hua S. and Sun Z., Support vector machine approach for protein subcellular localization prediction, Bioinformatics, 17(8), p. 721--728, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  7. Hui L., Color Set Size Problem with Applications to String Matching, Combinatorial String Matching, Lecture Notes in Computer Science, 644, p. 230--243, Springer-Verlag, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jacoboni I., Martelli P., Fariselli P., De Pinto V. and Casadio R., Prediction of the transmembrane regions of β-barrel membrane proteins with a neural network-based predictor, Protein Science, 10, p. 779--787, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  9. Joachims T., Learning to Classify Text Using Support Vector Machines. Dissertation, Kluwer, 2002. software downloadable at http://svmlight.joachims.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Landau G. and Vishkin U., Fast Parallel and Serial Approximate String Matching, Journal of Algorithms, 10(2):157--169, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lesh N., Zaki M. J. and Ogihara M., Mining Features for Sequence Classification, 5th ACM SIGKDD, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Leslie C., Eskin E. and Noble W., The spectrum kernel: A string kernel for SVM protein classification, Proceedings of the Pacific Symposium on Biocomputing, p. 564--575, 2002.Google ScholarGoogle Scholar
  13. Liu B., Hsu W. and Ma Y., Integrating classification and association rule mining, KDD'98, New York, NY, 1998.Google ScholarGoogle Scholar
  14. Martelli P., Fariselli P., Krogh A. and Casadio R., A sequence-profile-based HMM for predicting and discrimating β barrel membrane proteins, Bioinformatics, 18(1) 2002, S46-S53, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  15. Nakashima H. and Nishikawa K., Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies, Journal of Molecular Biology, 238, p. 54--61, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  16. Quinlan J., C4.5: programs for machine learning, Morgan Kaufmann Publishers, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Reinhardt A. and Hubbard T., Using neural networks for prediction of the subcellular location of proteins, Nucleic Acids Research, 26(9), p. 2230--2236, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  18. Rulequest Research, Information on See5/C5.0, at http://www.rulequest.com/see5-info.htmlGoogle ScholarGoogle Scholar
  19. Schirmer T. and Cowan S., Prediction of membrane-spanning β-strands and its application to maltoporin, Protein Science, 2, p. 1361--1363, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  20. Schulz G., β-barrel membrane proteins, Curr. Opin. Struct. Biology, 10, p. 443--447, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  21. Vapnik V., The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Vert J.-P., Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings, Proceedings of the Pacific Symposium on Biocomputing, p. 649--660, 2002.Google ScholarGoogle Scholar
  23. Wang J., Chirn G., Marr T., Shapiro B., Shasha D. and Zhang K., Combinatorial Pattern Discovery for Scientific Data: Some Preliminary Results, SIGMOD-94, Minnesota, USA, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Wang K., Zhou S. and He Y., Growing Decision Tree on Support-less Association Rules, KDD'00, Boston, MA, USA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Wimley W., Toward genomic identification of β-barrel membrane proteins: Composition and architecture of known structures, Protein Science, 11, p. 301--312, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  26. Yang M.-H., Roth D. and Ahuja N.: A Tale of Two Classifiers: SNoW vs. SVM in Visual Recognition. ECCV (4): 685--699, 2002 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yuan Z., Prediction of protein subcellular locations using Markov chain models, FEBS Lett., 451, p. 23--26, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  28. Zhai Y. and Saier M., The β-barrel finder (BBF) program, allowing identification of outer membrane β-barrel proteins encoded within prokaryotic genomes, Protein Science, 11, p. 2196--2207, 2002.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Frequent-subsequence-based prediction of outer membrane proteins

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2003
      736 pages
      ISBN:1581137370
      DOI:10.1145/956750

      Copyright © 2003 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 August 2003

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      KDD '03 Paper Acceptance Rate46of298submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader