Abstract
Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps:
-
Collect statistics from biological data.
-
Build a computational model.
-
Solve a computational modeling problem.
-
Test and evaluate a computational algorithm.
This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein–protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Zvelebil M, Baum J (2007) Understanding bioinformatics. Garland Science, New York, NY. ISBN 978-0815340249
Chatr-aryamontri A, Ceol A, Palazzi LM et al (2007) MINT: the Molecular INTeraction database. Nucleic Acids Res 35(Suppl 1):D572–D574
Kerrien S, Aranda B, Breuza L et al (2012) The IntAct molecular interaction database in 2012. Nucleic Acids Res 40(D1):D841–D846
Xenarios I, Rice DW, Salwinski L et al (2000) DIP: the database of interacting proteins. Nucleic Acids Res 28:289–291
Maglott D, Ostell J, Pruitt KD, Tatusova T (2010) Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 33(D1): D54–D58
Flicek P, Amode MR, Barrell D et al (2012) Ensemble 2012. Nucleic Acids Res 40(D1): D84–D90
Kent WJ, Sugnet CW, Furey TS et al (2002) The human genome browser at UCSC. Genome Res 12(6):996–1006
Kanehisa M, Goto S, Sato Y et al (2012) KEGG for integration and interpretation of large-scale molecular datasets. Nucleic Acids Res 40(D1):D109–D114
The Gene Ontology Consortium (2000) Gene ontology: tool for the unification of biology. Nat Genet 25(1):25–29
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197
Altschul S, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410
Lipman DJ, Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227(4693):1435–1441
Bafna V, Lawler EL, Pevzner PA (1993) Approximation algorithms for multiple sequence alignment. Theor Comput Sci 182(1–2):233–244
Chenna R, Sugawara H, Koike T et al (2003) Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res 31(13): 3497–3500
Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4(4): 406–425
Chakrabarti S, Lanczycki CJ, Panchenko AR et al (2006) State of the art: refinement of multiple sequence alignments. BMC Bioinformatics 7:499
Weiner, P. (1973) Linear pattern matching algorithm, 14th Annual IEEE Symposium on Switching and Automata Theory, 15–17 October, 1973, USA, pp 1–11
Cobbs AL (1995) Fast approximate matching using suffix trees. Combinatorial pattern matching, vol 937, Lecture notes in computer science. Springer, New York, NY, pp 41–54
Haeckel E (1868) The history of creation, vol 1, 3rd edn. Trench & Co., London, Translated by E. Ray Lankester, Kegan Paul
Sokal R, Michener C (1958) A statistical method for evaluating systematic relationships. Univ Kans Sci Bull 38:1409–1438
Day WHE (1986) Computational complexity of inferring phylogenies from dissimilarity matrices. Bull Math Biol 49:461–467
Lathrop RH (1994) The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Eng 7(9): 1059–1068
Subbiah S, Laurents DV, Levitt M (1993) Structural similarity of DNA-binding domains of bacteriophage repressors and the globin core. Curr Biol 3:141–148
Holm L, Sander C (1993) Protein structure comparison by alignment of distance matrices. J Mol Biol 233(1):123–138
Gibrat JF, Madej T, Bryant SH (1996) Surprising similarities in structure comparison. Curr Opin Struct Biol 6(3):377–385
Shindyalov IN, Bourne PE (1998) Protein structure alignment by incremental combinatorial extension of the optimum path. Protein Eng 11(9):739–747
Singh AP, Brutlag DL (1997) Hierarchical protein structure superposition using both secondary structure and atomic representations. In Proc. Fifth Int. Conf. on Intell. Sys. for Mol. Biol. AAAI Press, Menlo Park, CA, pp 284–293
Viksna J, Gilbert D (2001) Pattern matching and pattern discovery algorithms for protein topologies. Algorithms in bioinformatics: first international workshop, WABI 2001 proceedings, vol 2149, Lecture notes in computer science. Springer, New York, NY, pp 98–111
Edgar R, Domrachev M, Lash AE (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30(1):207–210
Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci 98(9):5116–5121
Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inform Theor 28(2): 129–137
Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69
van Dongen, S. (2000) Graph clustering by flow simulation. Ph.D. thesis, University of Utrecht, May 2000
King AD, Pržulj N, Jurisica I (2004) Protein complex prediction via cost-based clustering. Bioinformatics 20(17):3013–3020
Blatt M, Wiseman S, Domany E (1996) Superparamagnetic clustering of data. Phys Rev Lett 76:3251–3254
Bader GD, Hogue CW (2003) An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4:2
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this protocol
Cite this protocol
Can, T. (2014). Introduction to Bioinformatics. In: Yousef, M., Allmer, J. (eds) miRNomics: MicroRNA Biology and Computational Analysis. Methods in Molecular Biology, vol 1107. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-748-8_4
Download citation
DOI: https://doi.org/10.1007/978-1-62703-748-8_4
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-62703-747-1
Online ISBN: 978-1-62703-748-8
eBook Packages: Springer Protocols