Life Ascending – Chapter 2: DNA

There’s a blue plaque on the Eagle pub in Cambridge, mounted in 2003 to commemorate the fiftieth anniversary of an unusual turn in pub conversation. At lunchtime on 28 February 1953, a couple of regulars, James Watson and Francis Crick, burst into the pub and announced that they had discovered the secret of life. Although the intense American and voluble Brit with an irritating laugh must at times have seemed to verge on a comic double-act, this time they were serious – and half right. If life can be said to have a secret, it is certainly DNA. But Crick and Watson, for all their cleverness, then only knew half its secret.

That morning Crick and Watson had figured out that DNA is a double helix. An inspired leap of mind based on a mixture of genius, model-building, chemical reasoning, and a few pilfered X-ray diffraction photos, their conception was, in Watson’s words, just ‘so pretty it had to be right’. And the more they talked that lunchtime the more they knew it was. Their solution was published in Nature on 25 April as a one-page letter, a kind of announcement not unlike the birth notices in a local newspaper. Unusually modest in tone (Watson famously wrote that he had never met Crick in a modest mood, and he wasn’t much better himself ), the paper closed with the coy understatement: ‘It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material.’

DNA, of course, is the stuff of genes, the hereditary material. It codes for human being and amoeba, mushroom and bacterium, everything on this earth bar a few viruses. Its double helix is a scientific icon, the two helices pursuing each other round and round in an endless chase. Watson and Crick showed how each strand complements the other at a molecular level. Prise the strands apart and each acts as a template to reform the other, forging two identical double helices where once there was one. Every time an organism reproduces, it passes a copy of its DNA to its offspring. All it needs to do is pull the two strands apart to produce two identical copies of the original.

While the detailed molecular mechanics could give anybody a headache, the principle itself is beautifully, breathtakingly, simple. The genetic code is a succession of letters (more technically ‘bases’). There are only four letters in the DNA alphabet – A, T, G and C. These stand for adenine, thymine, guanine and cytosine, but the chemical names needn’t worry us. The point is that, constrained by their shape and bond structure, A can only ever pair with T, and C with G (see Fig. 2.1). Prise the double helix apart, and each strandbristles with unpaired letters. For every exposed A, only a T can bind; for every C, a G; and so on. These base pairs don’t just complement each other, they really want to bind to each other. There’s only one thing to brighten up the dull chemical life of a T and that’s close proximity to an A. Put them together and their bonds sing in lovely harmony. This is true chemistry: an authentic ‘basic attraction’. So DNA is not merely a passive template; each strand exerts a sort of magnetism for its alter ego. Pull the strands apart and they will spontaneously coalesce together again, or, if they’re kept apart, each strand is a template with an urgent tug for its perfect partner.

The succession of letters in DNA seems endlessly long. There are nearly 3 billion letters (base pairs) in the human genome, for example – 3 gigabases, in the lingo. That’s to say, a single set of chromosomes in the nucleus of a cell contains a list of 3,000,000,000 individual letters. If typed out, the human genome would fill about 200 volumes, each of them the size of a telephone directory. And the human genome is by no means the largest. Rather surprisingly, that record falls to a measly amoeba, Amoeba dubia, with a gargantuan genome of 670 gigabases, some 220 times the size of our own. Most of it seems to be ‘junk’, coding for nothing at all.

Every time a cell divides, it replicates all of its DNA, a process that takes place in a matter of hours. The human body is a monster of 15 million million cells, each one harbouring its own faithful copy of the same DNA (two copies in fact). To form your body from a single egg cell, your DNA helices were prised apart to act as a template 15 million million times (and indeed many more, for cells die and are replaced all the time). Each letter is copied with a precision bordering on the miraculous, recreating the order of the original with an error rate of about one letter in 1,000 million. In comparison, for a scribe to work with a similar precision, he would need to copy out the entire bible 280 times before making a single error. In fact, the scribes’ success was a lot lower. There are said to be 24,000 surviving manuscript copies of the New Testament, and no two copies are identical.

Even in DNA, though, errors build up, if only because the genome is so very big. Such errors are called point mutations, in which one letter is substituted for another by mistake. Each time a human cell divides, you’d expect to see about three mutations per set of chromosomes. And the more times that acell divides, the more such mutations accumulate, ultimately contributing to diseases like cancer. Mutations also cross generations. If a fertilised egg develops as a female embryo, it takes about thirty rounds of cell division to form a new egg cell; and each round adds a few more mutations. Men are even worse: a hundred rounds of cell division are needed to make sperm, with each round linked inexorably to more mutations. Because sperm production goes on throughout life, round after round of cell division, the older the man, the worse it gets. As the geneticist James Crow put it, the greatest mutational health hazard in the population is fertile old men. But even an average child, of youthful parents, has around 200 new mutations compared with their parents (although only a handful of these may be directly harmful).

And so despite the remarkable fidelity with which DNA is copied, change happens. Every generation is different from the last, not only because our genes are stirred by sex, but also because we all carry new mutations. Many of these mutations are the ‘point’ mutations that we’ve talked about, a change in a single DNA letter, but some are altogether more drastic. Whole chromosomes are replicated or fail to separate; vast tracts of DNA are deleted; viruses insert new chunks; bits of chromosomes invert themselves, reversing the sequence of letters. The possibilities are endless, though the grossest changes are rarely compatible with survival. When seen at this level, the genome seethes like a snakepit, with its serpentine chromosomes fusing and dividing, eternally restless. Natural selection, by casting away all but the least of these monsters, is actually a force for stability. DNA morphs and twists, selection straightens. Any positive changes are retained, while more serious errors or alterations miscarry, literally. Other mutations, less serious, may be associated with disease later in life.

The shifting sequence of letters in DNA is behind almost everything that we read about in the papers concerning our genes. DNA fingerprinting, for example – used to establish paternity, impeach presidents, or incriminate suspects decades after the event – is based on differences in the sequence of letters between individuals. Because there are so many differences in DNA, we each have our own unique DNA ‘fingerprint’. Likewise, our susceptibility to many diseases depends on tiny differences in DNA sequence. On average, humans differ by around one letter every 1,000 or so, giving 6 to 10 millionsingle letter differences in the human genome, known as ‘snips’ (for ‘single nucleotide polymorphisms’). The existence of snips means that we all harbour slightly different versions of most genes. While many snips are almost certainly inconsequential, others are statistically associated with conditions like diabetes or Alzheimer’s disease, although exactly how they exert their effects is all too often uncertain.

Despite these differences, it’s still possible to talk about a ‘human genome’; for all the snips, 999 letters out of every 1,000 are still identical in all of us. There are two reasons for this: time and selection. In the evolutionary scheme of things, not a lot of time has passed since we were all apes; indeed a zoologist would assure me we still are. Assuming that humans split off from our common ancestor with chimps around 6 million years ago, and accumulated mutations at the rate of 200 per generation ever since, we ’ve still only had time to modify about 1 per cent of our genome in the time available. As chimps have been evolving at a similar rate, theoretically we should expect to see a 2 per cent difference. In fact the difference is a little less than that; in terms of DNA sequence, chimps and humans are around 98.6 per cent identical. The reason is that selection applies the brakes, by eliminating most of the detrimental changes. If changes are eliminated by selection, the sequences that do persist are obviously more similar to each other than they would be if change were unconstrained; again, selection is straightening.

Going further back into deep time, these two traits, time and selection, conspire to produce the most marvellous and intricate of tapestries. All life on our planet is related, and the readout of letters in DNA shows exactly how. By comparing DNA sequences, we can compute statistically how closely related we are to anything, from monkeys to marsupials, to reptiles, amphibians, fish, insects, crustaceans, worms, plants, protozoa, bacteria – you name it. All of us are specified by exactly comparable sequences of letters. We even share tracts of sequence in common, the bits constrained by common selection, while other parts have altered beyond recognition. Read out the DNA sequence of a rabbit and you will find the same interminable succession of bases, with some sequences identical to ours, others different, intermingling in and out like a kaleidoscope. The same is true of a thistle: the sequence is identical, or similar, in places, but now larger tracts are different, echoing thevast tracts of time since we shared a common ancestor, and the utterly different ways of life we lead. But our deep biochemistry is still the same. We are all built from cells that work in much the same way, and these are still specified by similar sequences of DNA.

Given these deep biochemical commonalities, we would expect to find sequences in common with even the most remote forms of life, like bacteria, and so we do. But in fact there is scope for confusion here, as sequence similarity is not plotted on a scale of 0 to 100 per cent, as one might expect, but from 25 to 100 per cent. This reflects the four letters of DNA. If one letter is substituted for another at random, there is a 25 per cent chance that the same letter will be substituted. Likewise if a random stretch of DNA is synthesised from scratch in the lab, there must necessarily be a 25 per cent sequence similarity to any one of our genes chosen at random – the probability that each letter will match a letter in human DNA is a quarter. As a result, the idea that we are ‘half banana’ because we share 50 per cent of our genome sequence with a banana is misleading, to put it mildly. By the same reasoning, any randomly generated stretch of DNA would be a quarter human. Unless we know what the letters actually mean, we ’re completely in the dark.

And that is why Watson and Crick only grasped half of the secret of life that morning in 1953. They knew the structure of DNA and understood how each strand of the double helix could serve as a template for another, so forming the hereditary code for each organism. What they didn’t mention in their famous paper, because it took a further ten years of ingenious research to find out, was what the sequence of letters actually codes for. Although lacking the majestic symbolism of the double helix, which cares not a hoot for the letters entrained in its spiral, deciphering the code of life was perhaps the larger achievement, one in which Crick himself figured prominently. Most importantly, from our point of view in this chapter, the deciphered code, initially the most puzzling disappointment in modern biology, gives intriguing insights into how DNA evolved in the first place, nearly 4 billion years ago….