BIOINFORMATCS

HMM Informatics describes the study and practice of creating, storing, finding, manipulating and sharing information. There are many other neologisms and phrases derived from this: bioinformatics, chemoinformatics, health informatics, nursing informatics, poli-informatics, for example.
Bioinformatics is a largely but not exclusively computational subject. In the biological sciences, bioinformatics is so central to understsanding the enormous data sets that characterise modern biology that it is arguably not a separate discipline. Nevertheless the theory of information is extremely important and a comprehensive book can be read on the Web thanks to David J C MacKay at Cambridge.
The following is the only survivor from the original Web site (this revision August 2015). These notes accompanied a set of lectures on the chemical and biological background to bioiformatics for students who had studied neither chemistry nor biology.

1	Biological chemistry in time and space
2	Molecular structures and representations
3	Equilibrium and reactions
4	The biological and chemical literature
5	Macromolecules: primary structures and conformations
6	Genomes, gene regulation and protein biosynthesis
7	Metabolic, regulatory and neural networks
8	Classification and ontologies in biological sciences
9	Recombination, repair, rearrangment and evolution
10	Exam questions (!)

1: Biological chemistry in time and space

Contents	Time scales Size scales Complexity
Abstract	The earth has existed for a substantial proportion of the history of the known universe and has been inhabited for the greater part of its history. A useful classification is that of Chandler for the semioitics of complex systems.
Objectives	At the end of this topic, you should understand the way in complexity arose from complex molecules and the interactions of these.
Why is this topic important?	Because chemistry is important.
Why is it interesting?	This is a good natural example of emerging properties as a system becomes more complex.

Time scales

Here is a summary of history from the big bang to today.
cartoon history from the big bang to today

The units on the abscissa are 10^-12 years. The scale is approximate and, arguably, contentious. Also the cartoons might be misleading: the old world continents were formed quite recently and we do not know what the first form of life might have looked like. This form of life was probably some kind of "bacterium" but the cartoon is realistic in the sense that the organism was certainly not green. The point we make is that life on earth occupies a significant proportion of the "history of everything" on a linear scale on the abscissa. Although we look at this in more detail in section 8 we can point out here that a logarithmic scale would be needed to indicate a separation of "closely related" pairs of organisms such as codfish & gorillas, carrots & buttercups, yeast & mushrooms....

Symbol	Class of object	Notes
O^o1	subatomic particles	electrons, protons, neutrons....
O^o2	Atoms
O^o3	Molecules	This class includes ions.
O^o4	Biomacromolecules	DNA, RNA, proteins, polysaccharides
O^o5	Cells	Living objects having a boundary and sustained by a genetic system. A multicellular organism is 'a cell' in this context.
O^o6	Ecoment	The surrounds of a cell in the above sense. Nutrients and external signals (such as stimuli) are parts of the ecoment.
O^o7	Environment

Contents	Molecular shapes Chirality Conventions Properties
Abstract	3D chemical structures of molecules are drawn or described in one or two dimensions. Important properties of biological molecules derive from a type of asymmetry referred to as "chirality"; of particular importance is the presence of two or more chiral centres in a molecule or in an interacting molecular system.
Objectives	In this topic you will learn how to recognise and describe organic molecules and will appreciate the concepts of conformation, isomerism and functional group.
Why is this topic important?	It is the essential foundation for understanding biological chemistry.
Why is it interesting?	There are several points of interest (I hope) but the ideas behind representation and interpretation have generic aspects.

Element	Valency	Notes
C	4	The chemistry of carbon is called "organic chemistry" but not all organic compounds are relevant to biological chemistry.
H	1
O	2
N	3	N can have a valency of 5
P	5	P can have a valency of 3
S	2, 4 or 6	2 in proteins etc.
Fe	2 or 3	usually as +ve ions in biological chemistry
Na	1	always as +ve ions in biological chemistry
K	1	always as +ve ions in biological chemistry
Mg	2	always as +ve ions in biological chemistry
Ca	2	always as +ve ions in biological chemistry
Cl	1	nearly always as -ve ions in biological chemistry

Objects	Viewpoint/conformation	Conventions
hands	from the back of the hand	draw arrow from wrist to fingers and draw line for the thumb
Certain molecules	Look at each chiral C atom so that attached C atoms are away from the viewpoint and the other two are towards the viewpoint; use a conformation so this applies to all chiral C atoms.	Represent C-C bonds just as a line; do not bother to draw C-H bonds.

Contents	Units Equilibrium Thermodynamic considerations Reaction rates
Abstract	Equlibrium positions are related to classical chemical thermodynamics. Chemical reactions need to be thermodynamically feasible and there needs to be an available kinetic mechanism for them to occur.
Objectives	At the end of this topic you should understand equilbrium constants, pH, pK, enthalpy- entropy- and free energy- changes, and rates of reaction.
Why is this topic important?	It is central to knowing whether reactions can take place.
Why is it interesting?	Because it is not too far away from physics.

Symbol	Meaning	Equation/explanation
T R	Absolute temperature Universal Gas Constant
H	Enthalpy or "Heat Content"	H = U + pV where U is internal energy; p, V are pressure and volume.
S	Entropy	A measure of disorder in a system
F (or A)	Helmholz Free Energy	F = U - TS
G	Gibbs Free Energy	G = H - TS
a	Activity	a thermodynamic function that describes the effective concentration of a substance; for many cases in solution, for substance X, a_X is approximately [X].
G^o	Standard free energy	G at unit activity; it appears as a constant in equations such as the following which describes G for the ith component of a system: G_i = G_i^o + RTlna_i
	Standard free energy change	Take the reaction W + X ↔ Y + Z i.e. RTlnK IFF we accept the "a is [ ]" approximation.
ΔG	Free energy change	For the reaction W + X → Y + Z

Abstract	Use the Web
Objectives	In this topic, you will be introduced to important W3 sites and search engines.
Why is this topic important?	We need to get some more data and ideas.
Why is it interesting?	Because some of the Web sites are interesting.
Contents	There are no subsections in this topic.

URL	comment	more comments
www.google.co.uk	Useful search engine	String some words together: it is useful to include "tutorial" and/or "intro" in the search field.
www.chemfinder.com	Fairly friendly way of asking "what is that compound?"	with non-standard names it is better with medical/pharmaceuticals
www.ncbi.nlm.nih.gov	The most important site for molecular biology etc.: you must get used to it.	To get started choose their link to "seven modules"
www.cas.org	"cas" is Chemical Abstracts	There is a different approach from www.ncbi.nlm.nih.gov but there is considerable overlap in coverage. Try the "SciFinder" and the life sciences links.
www.expasy.ch	Major Swiss molecular biology server	largely a research facility but with tutorial material
www.expasy.ch	Introduction to Expasy	see above

Contents	Amino acids and proteins Carbohydrates, RNA, DNA Protein structure DNA structure
Abstract	Proteins are polypeptides. Polysaccharides contain sugar residues. RNA and DNA are polynucleotides.
Objectives	At the end of this topic you should have a basic knowledge of the ways in which proteins and nucleic acids are built up from amino acid and nucleotide residues and ways in which protein and DNA structures can be represented.
Why is this topic important?	It underlies most of the applications of bioinformatics.
Why is it interesting?	It is a classic example of emerging properties.

A	Ala	L-alanine	small, unreactive
C	Cys	L-cysteine	contains SH
D	Asp	L-aspartic acid	carboxylic acid
E	Glu	L-glutamic acid	carboxylic acid
F	Phe	L-phenylalanine	bulky, aromatic, hydrophobic
G	Gly	glycine	very small
H	His	L-histidine	reactive, can be acid or base
I	Ile	L-isoleucine	bulky, hydrophobic
K	Lys	L-lysine	basic
L	Leu	L-leucine	similiar to I (Ile)
M	Met	L-methionine	contains S, hydrophobic
N	Asn	L-asparagine	an amide
P	Pro	L-proline	an imino acid; breaks an H-element (except in certain integral membrane proteins)
Q	Gln	L-glutamine	an amide
R	Arg	L-arginine	stronly basic
S	Ser	L-serine	contains OH
T	Thr	L-threonine	contains OH
V	Val	L-valine	very similiar to I (Ile)
W	Trp	L-tryptophan	bulky, aromatic, hydrophobic
Y	Tyr	L-tyrosine	a phenol

Contents	Metabolism: some principles (i) Enzymes Metabolism: some principles (ii) triphosphates and REDOX Metabolic networks Regulatory networks Neural networks
Abstract	Enzyme catalysed reactions show characteristic saturation kinetics. Metabolism is constrained by the requirement for overall favourable (negative free energy change) processes. ATP can be regarded as an "energy currency": the synthesis of ATP involves REDOX reactions and, in the case of photosynthetic organisms the splitting of H₂O. The biosynthesis of DNA, RNA and protein are examples of expensive processes that have substantial demand on triphosphate utilisation. "Metabolic pathways" (correctly metabolic nets) describe the interconversions of metabolites. Regulatory networks can involve many processes including the regulation of transcription and translation. Although not a formal part of the module, some references are given top the comparison between artificial neural nets and the activities of neurons.
Objectives	At the end of this topic, you should understand the concept of metabolism and the importance of free energy and metabolic and regulatory networks.
Why is this topic important?	The subject is central to biological chemistry. More specifically the accurate modelling of such networks is a major area of research in contemporary bioinformatics.
Why is it interesting?	Biological networks resemble other types of nets in several respects but they are characteristically robust and redundant.

Contents	Speciation Enzymes and Molecules
Abstract	Biological classification or taxonomy was originally based on the scoring of characters in specimens. The systematic method of doing this was an early example of computer aided linkage analysis. Modern taxonomies rely on macromolecular sequences. Enzymes are classified in a system of "EC numbers". The heirarchy in this case is less easy to parse.
Objectives	At the end of this topic, the student should have a working knowledge of the principles and practice of biological taxonomy and to be familar with resources to relate EC numbers and enzyme activities.
Why is this topic important?	Classification is important in relating current research studies to older literature descriptions.
Why is it interesting?	Biological taxonomy is an example of an ontology.

↓ Character \| Specimen →	1	2	3	4	6	7	8	9
has 4 legs, 1 at each corner	1	1	1	1	1	1	1	1
has a tail	1	1	1	1	1	1	0	1
has course hair	0	0	1	0	1	0	0	1
has soft fur	1	1	0	0	0	1	1	0
makes miaowing sound when stroked	1	0	0	0	0	1	1	0
is coloured black	0	1	0	0	0	0	1	1
makes barking noise when needs feeding	0	0	1	0	1	0	0	0

	Question	Possible answer	Problem
1	What is a species?	defined by saying (i) 2 members of a species can interbreed but (ii) members of different species cannot	Any gardener or keeper of pet fish can tell you (ii) is not true and (i) fails for organisms that do not reproduce sexually.
2	How are members of a taxon related?	They had a common ancestor	Among the bacteria (possibly higher organisms) genes or clusters of genes can be transferred between very unrelated organisms.
3	How do we decide how the boundaries of a taxon might be defined?	let's say that a certain degree of similarity, sequence homology etc. defines a species etc.: S% for a species, G% for a genus, F% for a family... where S<G<F ...	Well that's not going to work too well! Using the criteria of bacterial taxonomy, humans, chimpanzees and gorillas would all be minor variations within the same species and yet they are not even in the same genus.

		type	comp. residue
U	RNA	pyrimidine	A
T	DNA	pyrimidine	A
C	DNA and RNA	pyrimidine	G
A	DNA and RNA	purine	T (U in RNA)
G	DNA and RNA	purine	C
Y	DNA and RNA	any pYrimidine	R
R	DNA and RNA	any puRine	Y

Contents	Genes and gene expression Sequence relationships
Abstract	We explain the words/phrases, gene, chromosome, genome, genetic system, haploid, diploid and allele. The expression of many genes involves transcription (formation of RNA) and translation (formation of protein) and there are significant differences in these processes in bacteria and eukaryotes. The relationship between DNA/RNA sequences is that of complementarity. The relationship between an RNA sequence and a protein sequence is the Genetic Code. In order to use this Code, it is important to know which reading frame is in use.
Objectives	At the end of this topic you should have an understanding of genes, genomes and gene expression including the relationships between DNA, RNA and protein sequences.
Why is this topic important?	Much molecular biological data is derived from a knowledge of DNA sequences.
Why is it interesting?	This a biochemical equivalent of language processing. A gene sequence in DNA, in some sense, "means" the product(s) of gene expression.

	Process names	Enzymes	Other components
1	DNA replication	protein	protein, RNA
1	Regulation	protein	DNA, RNA
1	Error correction	protein	DNA, protein
2	Transcription	protein	protein
2	Regulation	protein	DNA, RNA
3	RNA modification and editing	protein, RNA	protein, RNA
4	Translation	protein, RNA	RNA
4	Regulation	protein, RNA	protein, RNA
4	Error Correction	protein	protein, RNA
5	Protein folding and modification	protein	protein
6	DNA repair	protein	protein, DNA
7	Genetic recombination	protein	protein, DNA
8	RNA turnover	protein	protein, RNA
9	Protein turnover	protein	protein

N1	Description	number of N2's	examples
1	Oxidoreductases	19+"97"
2	Transferases	9
3	Hydrolases	13
4	Lyases	6+"99"	4.1 carbon-carbon lyases 4.2 carbon-oxygen lyases
5	Isomerases	5+"99"
6	Ligases	5	6.1 forming carbon-oxygen bonds 6.4 forming carbon-carbon bonds

Contents	Recombination (1) Recombination (2) DNA repair Evolution
Abstract	Reciprocal recombination, not be confused with chromosome inheritance occurs during meiosis. Site-specific recombination is important in several events notably transposition of genes. Enzymes monitor DNA for evidence of damage or erroneous replication and effect repair frequently by re-synthesis. DNA biochemistry provides a mechanism for biological evolution and site-specific recombination and allied processes provide a mechanism for lateral gene transfer between bacteria.
Objectives	At the end of this topic, the student should be familar with DNA rearrangements in genetic recombination, the response of cells to damaged DNA and should have an understanding of the role of molecular biology in providing plausible mechanisms for biological evolution.
Why is this topic important?	Bioinformatics relies extensively on principles of change and evolution.
Why is it interesting?	This topic has generated one of the most powerful heuristics in computer science: the genetic algorithm.

1: Biological chemistry in time and space

Time scales

Size scales

Complexity

2: Molecular structures and representations

Molecular shapes

Chirality

Conventions

Properties

References

3: Equilibrium and reactions

Units

Equilibrium

Equlibrium

Reaction rates

4: The biological and chemical literature

5: Macromolecules: primary structures and conformations

Amino acids peptides and proteins

Carbohydrates, nucleotides, polynucleotides

Protein structure

DNA structure

6: Genomes, gene regulation and protein biosynthesis

Genes and gene expression

Sequence relationships; the Genetic Code

7: Metabolic, regulatory and neural networks

Metabolism: some principles (i) Enzymes

Metabolism: some principles (ii) triphosphates and REDOX

Metabolic networks

Regulatory networks

Neural networks

8: Classification and ontologies in biological sciences

Speciation

Enzymes and molecules

9 Recombination, repair, rearrangment and evolution

Recombination: part 1

Recombination: part 2

DNA repair

Evolution