Introduction to Bioinformatics

Can, Tolga

doi:10.1007/978-1-62703-748-8_4

Introduction to Bioinformatics

Tolga Can⁴

Protocol
First Online: 11 November 2013

11k Accesses
30 Citations
17 Altmetric

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1107))

Abstract

Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps:

Collect statistics from biological data.
Build a computational model.
Solve a computational modeling problem.
Test and evaluate a computational algorithm.

This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein–protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.

This is a preview of subscription content, log in via an institution.

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

Zvelebil M, Baum J (2007) Understanding bioinformatics. Garland Science, New York, NY. ISBN 978-0815340249
Google Scholar
Chatr-aryamontri A, Ceol A, Palazzi LM et al (2007) MINT: the Molecular INTeraction database. Nucleic Acids Res 35(Suppl 1):D572–D574
Article PubMed CAS Google Scholar
Kerrien S, Aranda B, Breuza L et al (2012) The IntAct molecular interaction database in 2012. Nucleic Acids Res 40(D1):D841–D846
Article PubMed CAS Google Scholar
Xenarios I, Rice DW, Salwinski L et al (2000) DIP: the database of interacting proteins. Nucleic Acids Res 28:289–291
Article PubMed CAS Google Scholar
Maglott D, Ostell J, Pruitt KD, Tatusova T (2010) Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 33(D1): D54–D58
Google Scholar
Flicek P, Amode MR, Barrell D et al (2012) Ensemble 2012. Nucleic Acids Res 40(D1): D84–D90
Article PubMed CAS Google Scholar
Kent WJ, Sugnet CW, Furey TS et al (2002) The human genome browser at UCSC. Genome Res 12(6):996–1006
PubMed CAS Google Scholar
Kanehisa M, Goto S, Sato Y et al (2012) KEGG for integration and interpretation of large-scale molecular datasets. Nucleic Acids Res 40(D1):D109–D114
Article PubMed CAS Google Scholar
The Gene Ontology Consortium (2000) Gene ontology: tool for the unification of biology. Nat Genet 25(1):25–29
Article Google Scholar
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453
Article PubMed CAS Google Scholar
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197
Article PubMed CAS Google Scholar
Altschul S, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410
PubMed CAS Google Scholar
Lipman DJ, Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227(4693):1435–1441
Article PubMed CAS Google Scholar
Bafna V, Lawler EL, Pevzner PA (1993) Approximation algorithms for multiple sequence alignment. Theor Comput Sci 182(1–2):233–244
Google Scholar
Chenna R, Sugawara H, Koike T et al (2003) Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res 31(13): 3497–3500
Article PubMed CAS Google Scholar
Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4(4): 406–425
PubMed CAS Google Scholar
Chakrabarti S, Lanczycki CJ, Panchenko AR et al (2006) State of the art: refinement of multiple sequence alignments. BMC Bioinformatics 7:499
Article PubMed Google Scholar
Weiner, P. (1973) Linear pattern matching algorithm, 14th Annual IEEE Symposium on Switching and Automata Theory, 15–17 October, 1973, USA, pp 1–11
Google Scholar
Cobbs AL (1995) Fast approximate matching using suffix trees. Combinatorial pattern matching, vol 937, Lecture notes in computer science. Springer, New York, NY, pp 41–54
Book Google Scholar
Haeckel E (1868) The history of creation, vol 1, 3rd edn. Trench & Co., London, Translated by E. Ray Lankester, Kegan Paul
Google Scholar
Sokal R, Michener C (1958) A statistical method for evaluating systematic relationships. Univ Kans Sci Bull 38:1409–1438
Google Scholar
Day WHE (1986) Computational complexity of inferring phylogenies from dissimilarity matrices. Bull Math Biol 49:461–467
Google Scholar
Lathrop RH (1994) The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Eng 7(9): 1059–1068
Article PubMed CAS Google Scholar
Subbiah S, Laurents DV, Levitt M (1993) Structural similarity of DNA-binding domains of bacteriophage repressors and the globin core. Curr Biol 3:141–148
Article PubMed CAS Google Scholar
Holm L, Sander C (1993) Protein structure comparison by alignment of distance matrices. J Mol Biol 233(1):123–138
Article PubMed CAS Google Scholar
Gibrat JF, Madej T, Bryant SH (1996) Surprising similarities in structure comparison. Curr Opin Struct Biol 6(3):377–385
Article PubMed CAS Google Scholar
Shindyalov IN, Bourne PE (1998) Protein structure alignment by incremental combinatorial extension of the optimum path. Protein Eng 11(9):739–747
Article PubMed CAS Google Scholar
Singh AP, Brutlag DL (1997) Hierarchical protein structure superposition using both secondary structure and atomic representations. In Proc. Fifth Int. Conf. on Intell. Sys. for Mol. Biol. AAAI Press, Menlo Park, CA, pp 284–293
Google Scholar
Viksna J, Gilbert D (2001) Pattern matching and pattern discovery algorithms for protein topologies. Algorithms in bioinformatics: first international workshop, WABI 2001 proceedings, vol 2149, Lecture notes in computer science. Springer, New York, NY, pp 98–111
Google Scholar
Edgar R, Domrachev M, Lash AE (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30(1):207–210
Article PubMed CAS Google Scholar
Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci 98(9):5116–5121
Article PubMed CAS Google Scholar
Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inform Theor 28(2): 129–137
Article Google Scholar
Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69
Article Google Scholar
van Dongen, S. (2000) Graph clustering by flow simulation. Ph.D. thesis, University of Utrecht, May 2000
Google Scholar
King AD, Pržulj N, Jurisica I (2004) Protein complex prediction via cost-based clustering. Bioinformatics 20(17):3013–3020
Article PubMed CAS Google Scholar
Blatt M, Wiseman S, Domany E (1996) Superparamagnetic clustering of data. Phys Rev Lett 76:3251–3254
Article PubMed CAS Google Scholar
Bader GD, Hogue CW (2003) An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4:2
Article PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
Tolga Can

Authors

Tolga Can
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dabburiya Village, Israel
Malik Yousef
Izmir Institute of Technology, Izmir, Turkey
Jens Allmer

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Can, T. (2014). Introduction to Bioinformatics. In: Yousef, M., Allmer, J. (eds) miRNomics: MicroRNA Biology and Computational Analysis. Methods in Molecular Biology, vol 1107. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-748-8_4

Download citation

DOI: https://doi.org/10.1007/978-1-62703-748-8_4
Published: 11 November 2013
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-62703-747-1
Online ISBN: 978-1-62703-748-8
eBook Packages: Springer Protocols

Publish with us

Policies and ethics