Skip to main content

Introduction to Bioinformatics

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1107))

Abstract

Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps:

  • Collect statistics from biological data.

  • Build a computational model.

  • Solve a computational modeling problem.

  • Test and evaluate a computational algorithm.

This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein–protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Zvelebil M, Baum J (2007) Understanding bioinformatics. Garland Science, New York, NY. ISBN 978-0815340249

    Google Scholar 

  2. Chatr-aryamontri A, Ceol A, Palazzi LM et al (2007) MINT: the Molecular INTeraction database. Nucleic Acids Res 35(Suppl 1):D572–D574

    Article  PubMed  CAS  Google Scholar 

  3. Kerrien S, Aranda B, Breuza L et al (2012) The IntAct molecular interaction database in 2012. Nucleic Acids Res 40(D1):D841–D846

    Article  PubMed  CAS  Google Scholar 

  4. Xenarios I, Rice DW, Salwinski L et al (2000) DIP: the database of interacting proteins. Nucleic Acids Res 28:289–291

    Article  PubMed  CAS  Google Scholar 

  5. Maglott D, Ostell J, Pruitt KD, Tatusova T (2010) Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 33(D1): D54–D58

    Google Scholar 

  6. Flicek P, Amode MR, Barrell D et al (2012) Ensemble 2012. Nucleic Acids Res 40(D1): D84–D90

    Article  PubMed  CAS  Google Scholar 

  7. Kent WJ, Sugnet CW, Furey TS et al (2002) The human genome browser at UCSC. Genome Res 12(6):996–1006

    PubMed  CAS  Google Scholar 

  8. Kanehisa M, Goto S, Sato Y et al (2012) KEGG for integration and interpretation of large-scale molecular datasets. Nucleic Acids Res 40(D1):D109–D114

    Article  PubMed  CAS  Google Scholar 

  9. The Gene Ontology Consortium (2000) Gene ontology: tool for the unification of biology. Nat Genet 25(1):25–29

    Article  Google Scholar 

  10. Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453

    Article  PubMed  CAS  Google Scholar 

  11. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197

    Article  PubMed  CAS  Google Scholar 

  12. Altschul S, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410

    PubMed  CAS  Google Scholar 

  13. Lipman DJ, Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227(4693):1435–1441

    Article  PubMed  CAS  Google Scholar 

  14. Bafna V, Lawler EL, Pevzner PA (1993) Approximation algorithms for multiple sequence alignment. Theor Comput Sci 182(1–2):233–244

    Google Scholar 

  15. Chenna R, Sugawara H, Koike T et al (2003) Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res 31(13): 3497–3500

    Article  PubMed  CAS  Google Scholar 

  16. Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4(4): 406–425

    PubMed  CAS  Google Scholar 

  17. Chakrabarti S, Lanczycki CJ, Panchenko AR et al (2006) State of the art: refinement of multiple sequence alignments. BMC Bioinformatics 7:499

    Article  PubMed  Google Scholar 

  18. Weiner, P. (1973) Linear pattern matching algorithm, 14th Annual IEEE Symposium on Switching and Automata Theory, 15–17 October, 1973, USA, pp 1–11

    Google Scholar 

  19. Cobbs AL (1995) Fast approximate matching using suffix trees. Combinatorial pattern matching, vol 937, Lecture notes in computer science. Springer, New York, NY, pp 41–54

    Book  Google Scholar 

  20. Haeckel E (1868) The history of creation, vol 1, 3rd edn. Trench & Co., London, Translated by E. Ray Lankester, Kegan Paul

    Google Scholar 

  21. Sokal R, Michener C (1958) A statistical method for evaluating systematic relationships. Univ Kans Sci Bull 38:1409–1438

    Google Scholar 

  22. Day WHE (1986) Computational complexity of inferring phylogenies from dissimilarity matrices. Bull Math Biol 49:461–467

    Google Scholar 

  23. Lathrop RH (1994) The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Eng 7(9): 1059–1068

    Article  PubMed  CAS  Google Scholar 

  24. Subbiah S, Laurents DV, Levitt M (1993) Structural similarity of DNA-binding domains of bacteriophage repressors and the globin core. Curr Biol 3:141–148

    Article  PubMed  CAS  Google Scholar 

  25. Holm L, Sander C (1993) Protein structure comparison by alignment of distance matrices. J Mol Biol 233(1):123–138

    Article  PubMed  CAS  Google Scholar 

  26. Gibrat JF, Madej T, Bryant SH (1996) Surprising similarities in structure comparison. Curr Opin Struct Biol 6(3):377–385

    Article  PubMed  CAS  Google Scholar 

  27. Shindyalov IN, Bourne PE (1998) Protein structure alignment by incremental combinatorial extension of the optimum path. Protein Eng 11(9):739–747

    Article  PubMed  CAS  Google Scholar 

  28. Singh AP, Brutlag DL (1997) Hierarchical protein structure superposition using both secondary structure and atomic representations. In Proc. Fifth Int. Conf. on Intell. Sys. for Mol. Biol. AAAI Press, Menlo Park, CA, pp 284–293

    Google Scholar 

  29. Viksna J, Gilbert D (2001) Pattern matching and pattern discovery algorithms for protein topologies. Algorithms in bioinformatics: first international workshop, WABI 2001 proceedings, vol 2149, Lecture notes in computer science. Springer, New York, NY, pp 98–111

    Google Scholar 

  30. Edgar R, Domrachev M, Lash AE (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30(1):207–210

    Article  PubMed  CAS  Google Scholar 

  31. Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci 98(9):5116–5121

    Article  PubMed  CAS  Google Scholar 

  32. Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inform Theor 28(2): 129–137

    Article  Google Scholar 

  33. Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69

    Article  Google Scholar 

  34. van Dongen, S. (2000) Graph clustering by flow simulation. Ph.D. thesis, University of Utrecht, May 2000

    Google Scholar 

  35. King AD, Pržulj N, Jurisica I (2004) Protein complex prediction via cost-based clustering. Bioinformatics 20(17):3013–3020

    Article  PubMed  CAS  Google Scholar 

  36. Blatt M, Wiseman S, Domany E (1996) Superparamagnetic clustering of data. Phys Rev Lett 76:3251–3254

    Article  PubMed  CAS  Google Scholar 

  37. Bader GD, Hogue CW (2003) An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4:2

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this protocol

Cite this protocol

Can, T. (2014). Introduction to Bioinformatics. In: Yousef, M., Allmer, J. (eds) miRNomics: MicroRNA Biology and Computational Analysis. Methods in Molecular Biology, vol 1107. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-748-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-62703-748-8_4

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-62703-747-1

  • Online ISBN: 978-1-62703-748-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics