Clegg: An Introduction to Evolution for Computer Scientists
Metadata
Title: An Introduction to Evolution for Computer Scientists
Authors: Kester Clegg
Publication Year: 2008
Journal: Unknown
Assigned in: cm3020 Topic 01: Genetic Algorithms
Published Abstract
This report is intended as an introduction to evolution and evolutionary computation. Taking its lead from contemporary thinking in biology, it advocates that we have more to learn from the role that developmental processes play in natural evolutionary search than has previously been admitted by practitioners in evolutionary computation. A brief summary of evolutionary biology is given, from Darwin to Dawkins and beyond, alongside theoretical insights from authors such as Kauffman and Soĺe. The report then sketches the development of evolutionary computation as a field of computer science. It gives some background to how the “central dogma of biology” gained a canonical form as an algorithm and why decomposing tasks for evolutionary computation remains a difficult problem. It ends by indicating how gene re-use and context-dependent gene expression may provide one key to how evolution creates scalable solutions of such complexity.
Argument
Evolutionary computation suffers from adopting biological models that were standard in the 1960s when it emerged, which gave too much weight to the genome (‘genetic reductionism’) and weren’t sensitive enough to the developmental processes that constrain mutation and the combinatorial scale of variation. The field has largely ignored developments in evolutionary theory, including major developments in the 1990s on competitive ecologies.
By paying more attention to developmental evolutionary theory we can have a richer set of tools to work with in evolutionary computation. These developments are still nascent in the field.
Key Points
The main challenge for evolutionary computation is scale. They have been used successfully for search based optimization, where there are a handful of factors in the fitness assessment, but as the number of elements sought increases, the time complexity increases dramatically, exponentially if the factors affect each other.
As a result it is generally used in bootstrapping, to optimize a few variables in an existing solution.
GAs haven’t evolved much since they were originally developed.
Key Quotes
Evolutionary computation uses the process of natural selection as a search algorith. Like evolution, the algorithm works over successive generations, gradually moving the search closer to the objectives until no more improvement in the population is possible, or the objectives are satisfied.
The achievements of evolutionary computation remain relatively modest. No one has evolved the design for a car, a house, or anything where the number of parts exposed to random mutation significantly adds to the complexity of the objective.
If there were a single criticism of the field of evolutionary computing, it would be that too little evidence from biology has been used during adoption of the evolutionary paradigm. While no one would wish to replicate the intricacy of biology, evolutionary computation has historically taken a very narrow interpretation of evolution, one that is predominantly based on models of fitness landscapes. These models are highly abstract and perhaps rather sterile as a consequence. There is little opportuninty for the rich, complex interactions we see in real biological processes to take place. For example, hardly any work has included the role developmental processes play in evolutionary search, despite evidence from biology suggesting its fundamental importance. (p. 33)
a tiny fraction of what could be expressed is ever realised in a phenotype. A genome contains solutions for countless sets of contexts. Change the contexts and the genome still has room for developmental exploration. This flexibility and redundancy of solutions has an impact on the re-use and conservation of genes during developmental processes. Without similar mechanisms of interaction and feedback, digital genomes cannot guide themselves across functional search spaces in a way that fully exploits a domain’s resources, and this is particularly true where that domain includes the complexity provided by real-world physics. (p. 52)
While the wet manufacture of life is hardly a practical ambition for computer scientists, it contains clues, patterns if you like, of how subtle, scalable structures can be built that allow evolution to explore and interact with the world around it. DNA alone can’t do that: it’s a passive instruction set. To paraphrase Lewis Wolpert, it’s proteins that do all the work. (p. 52)
Article flow
History of evolutionary theory
The article starts by summarizing the state of the field of evolutionary computation, focusing on its limitations. It then gives a history of our understanding of natural evolution, starting with an overview of Darwin’s life and work against a backdrop of early Victorial ‘natural theology’ (Malthus et al). Darwin drew up three independent generalizations that allowed him to argue for natural selection:
Variation: No two individuals are identical within a population.
Hereditary characteristics: The variation expressed as individual characteristics is inheritable from parents to offspring.
Multiplication and competition: Malthus’s observation - about population increase where no competition for resources exist - means that as resources are always finite, competition must act as a brake on population growth.
If these processes were acting, then hereditary variations within a population that allowed an organism to survive would stand a greater chance of being passed down to their offspring. This explained why species diverged and how characteristic adaptations to the environment evolved within species.
The main unresolved challenge for Darwin was the assumption of blending heredity, which would halve the available variation in each generation. This was solved by Mendel theoretically in 1865, arguing heredity was particulate. This would give rise to later Neodarwinism, summarized by Dawkins (adapted by Clegg):
The genes of interbreeding animals constitute a gene pool. The genes compete, but in practice spend their time either sitting in individual bodies which they helped to build, or travelling from body to body via sperm or egg in the process of sexual reproduction. Sexual reproduction shuffles the genes, and it is in this sense that the long-term habitat of a gene is the gene pool. Any given gene originates in the gene pool as a result of a mutation, a random error in the gene-copying process. Once a mutation is formed, it can spread by means of sexual mixing provided that its carrier is able to sexually reproduce. Good carriers will contribute more to the ene pool than reproductively less successful ones. Any given gene in a gene pool is said to have a frequency, as it is likely to exist in the form of several copies, all descended from the original mutant. Some genes such as teh albino gene are rare in the gene pool, others are common. At a genetic level, evolution may be defined as teh process by which gene frequencies change in gene pools.
This account suffers from “genetic reductionism”, and challenges emerged that claimed it fails to explain large scale aspects of evolution, including the origin of species.
Genetic mutation painted a convincing picture of small scale changes - the “fine tuning” of varieties - but not differences of type: fish from amphibian, worms from insects, or horsetails from grasses.
The paper turns to the debate between Goodwin and Dawkins over the evolution of complex organs (like eyes), and Goodwin’s theory of developmental processes and morphological constraints. The debate is ongoing, but developmental processes that shape the phenotype and environmental constraints play a much larger role in it than in the 70s and 80s when genetic reductionism had its greatest power.
Development and Developmental Biology
The paper turns to understanding the role of development, ie “the emergence of organised structures from an initially very simple group of cells” in constraining designs. The process of moving from cells to complex structures is governed by proteins. It presents a taxonomy of cells and cell behaviour. It then discusses proteins, their chemical structure, and relation to the genome. Finally, it turns to DNA, which specifies the proteins for a cell, and RNA, which controls the protein synthesis based on the DNA template.
With those building blocks established the paper looks at how they have influenced developmental biology and evolutionary theory. It looks at how changes resulting from changes in the DNA are limited in where they can occur, suggesting that early embryonic stages of developent may be robustly protected against change, perhaps because they are risky or that changes are easier once critical development stages have been reached.
The paper talks about Hox gene clusters that play a crucial role in several species in a range of developmental processes, like specifying positional identity in an embryo, which allows the same genes expressed in a different location to produce a different morphological form. A few, very useful genes are re-used widely across species. “If we want evolutionary computation to mimic this trick, we need to discover what allows context-specific expression of a gene, and to allow that context to in-part define the gene’s functional role”.
The paper then goes into detail about how gene regulatory networks determine the contexts in which a particular gene is expressed or inhibited from transcribing proteins in the nucleus of a cell.
Modelling Evolution
The paper reviews abstract models of evolution.
It starts with Fitness landscapes that represent peaks and troughs of evolutionary success in a graph. An individual within a species is represented as a string of genes that defines its genotype. The string has a real number associated with it, defining the fitness of the string in terms of the phenotype it produces. The distribution of fitness values over the space of all genotypes gives the fitness landscape, and all members of the population map onto that landscape according to their fitness value.
Note that in reality:
there is no evidence that evolution works on single traits.
Fitness landscape models would have to amalgamate traits into an aggregated, holistic functional behaviour, but this would be difficult to visualize, so they tend to just plot against two dimensions representing two traits in the phenotype. The article reviews criticisms of these models as reflecting anything meaningful in the natural environment.
The paper then presents the NK Model which attempts to model the interdependency between genes. N is the number of genes, K how many other genes influence any given gene. As K increases, the number of peaks in the fitness landscape increases. Genes (or traits) are represented as binary alleles, so this starts to look like something CS people would recognize:
The \(2^N\) combinations of alleles of the N genes are therefore located on the vertices of the N-dimensional Boolean hypercube. The fitness of each type of oganism, or vertex, is written on that vertex and can be thought of as a height. Hence the NK model creates a fitness landscape over the N-dimensional Boolean hypercube. (Kauffman, 2000).
Because the landscape is big and interconnected, finding a global optimum becomes not just improbable, but of dubious value even as a strategy… the fact that the rate of improving fitness slows exponentially with each uphill step, and that the system gets stuck on local optima, correlates to the earlier suggestion by Wolpert that evolution merely “tinkers” with existing structures. Big jumps are not possible.
The paper then turns to coevolution models, pointing out that evolution works over species in competition, and in a situation where the landscape is dynamic. Species might get trapped on a local peak due to the environment changing faster than you do. It presents the NKC model which tries to model adaptive landscapes.
History of Evolutionary Computation
The paper then presents a history of evolutionary computation, from early failed attempts to evolve computer programs, to parameter optimization problems in the 1960s and genetic programming.
The section on Genetic Algorithms themselves starts on p. 36. It has emerged as the main paradigm of evolutionary computation.
The Canonical Genetic Algorithm
The paper sets out the canonical GA as follows, where \(t\) is the time step:
Set \(t=0\)
Initialize chromosome population \(P(t)\)
Evaluate \(P(t)\) using fitness criteria.
while termination condition not satisfied do
begin
(a) Select best individuals from \(P(t)\). Let \(P_1(t)\) be the set of selected chromosomes. Choose individuals from \(P_1(t)\) to enter mating pool (\(MP\)).
(b) Recombine chromosomes in \(MP\) forming populations \(P_2\). Mutate chromosomes in \(P_2\) forming \(P_3\).
(c) Select replacements from \(P_3\) and \(P(t)\) forming \(P(t+1)\)
(d) Set \(t=t+1\)
end
It is common to refer to the popluation as containing candidate solutions.
The paper reviews roulette wheel selection and implicit parallelism, covered in the module lectures. It also discusses the schema theorem, and the building block hypothesis, that GAs work by locating and maintaining “good” building blocks, though claims that this lacks theoretical grounding.
It discusses criticisms of binary representations as inappropriate for real-valued numerical optimization, better to use floating points.
It looks at limitations that have emerged from use, such as premature convergence. However, GAs remain popular:
They are now part of the standard toolbox of search algorithms where the search space is large and unpredictable, and have become the default method for tackling traditional, NP-hard problems, such as the travelling salesman.
But GAS have evolved little beyond the framework described in the 1960s, and haven’t found much application outside optimisation problems.
Weaknesses
The paper concludes by digging into problems with GAs, they are obsessed by optimisation (p. 41), the development of the phenotype from the genotype is largely absent (p. 45), lacking any models of development (p. 47), or the ability to have a single gene used in different roles elsewhere.