James Arnold Nogra

Senior Software Engineer

Posts

Introduction to Genetic Algorithm

21 May 2021

Introduction to Genetic Algorithm

Genetic algorithm (GA) is a search heuristic inspired by the processes of Charles Darwin’s Theory of Natural Selection. The algorithm reflects on how the fittest individuals have higher chances of mating thus passing their genes to their offspring. Because only the best individuals produce offspring, after a couple of generations, the population would be best suited for the environment.

In computer science, Genetic Algorithm has five phases namely initialization of population, fitness function, selection, crossover, and mutation. In the following example, we would create a hypothetical person with only three genes. The first gene is for the height, the 2nd gene is for the skin color, and the third gene is for the body type. Combining these three genes will become a chromosome.

Height Skin Color Body Type
Tall (T) White (W) Big (B)
Medium (M) Yellow (Y) Average (A)
Short (S) Black (B) Lanky (L)

Possible genes of the population.

During the population initialization phase of Genetic Algorithm, chromosomes are created by combining the defined gene for each trait. In this example, we can initialize people such as Tall-White-Big (TWB), Short-Yellow-Big (SYB), Medium-White-Lanky (MWL), and so on. Let’s just say we need 100 people for every generation based on the hypothetical genetic trait defined above.

Sample 100
[1] MBB
[2] TWB
[100] MYA

One hundred people generated from randomly selecting genes for the initial population.

Next phase is the fitness function. In order to define this, a goal must be set. For this example, a person with Tall-White-Average (TWA) must be set as the goal person. To calculate the fitness score for this particular example, the location of a gene of a person generated must match the gene of the goal person, the TWA. For the initial population, [2] TWB person has a fitness score of 2 because it matches the gene of the goal in terms Tall (T) and White (W). Person [100] MYA has a fitness score of 1 because it matches the Average (A) gene of the goal. Person [1] MBB has a 0 fitness score because none of its genes match with the TWA goal person.

Chromosome Fitness Score
[1] MBB 0
[2] TWB 2
[100] MYA 1

Sample Initial 100 Population with Fitness Score

Selection is the process of selecting which two people have to be mated to create an offspring. In Genetic Algorithm, the best offspring (high fitness score) will be allowed to reproduce and pass their genes to the next generation.

After selecting the best set of parents, new people (offsprings) will be generated based on the genes of the selected parents. This process in Genetic Algorithm is called Crossover. For the two parents, a location of their genes will be selected at random in which to pass to their offspring. For example, we generate a new person based on the genes of [2] and [100]. We could have parent [2] height Tall (T), parent [100] skin color Yellow (Y), and parent [2] body type Average (A). Based on the parent’s genes, the new person created (offspring) has a chromosome of Tall-Yellow-Average (TYA) which has a fitness score of 2. We mate two high fitness score parents that are randomly selected 100 times (number of population per generation) until we have new 100 offsprings.

Crossover 1

For the new offspring generated, a Mutation with low probability can be applied. This implies that one or more of the genes can be changed. In the example of new offspring Tall-Yellow-Average (TYA), we randomly select a gene location, in this example, the 3rd gene, and change it to a different gene, let’s just say from Average (A) to Lanky (L). This offspring has now a chromosome of Tall-Yellow-Lanky (TYL) which has a fitness score of 1. In Genetic Algorithm, it’s not rare that the new offspring has a lower fitness score than its parents.

In the last phase of Genetic Algorithm, a termination flag is set. In this example, if one of the new offspring generated has a chromosome of Tall-White-Average (TWA), then the whole process is stopped because a goal (solution) has been found.

Here is another example SimpleGeneticAlgorithm where the goal string is 38 characters long (THISISAVERYLONGSTRINGIHOPEGACANSOLVEIT) and each gene can be one of the 26 uppercase letters of the English alphabet. In this video, GA found the goal string (solution) after 1363 generations. If this was solved using brute force, it would take approximately 2638 (number with 54 digits) tries before it can guess the goal string. Assuming each generation of GA is equivalent to 1 billion tries of the brute force operation which we all know it’s not, GA would still be significantly faster.

Genetic Algorithm Video

Another example GeneticAlgorithmHTMLGame, where the goal is to find the shortest path from the green tile to the blue tile. The chromosome length is set to 25 and each gene can be Left (L), Right (R), Up (U), and Down (D). If the path generated crosses a red tile, 5 points is added to the fitness score which is determined by the number of moves it takes to reach the blue tile.

thumbnail
Best of Generation 1

thumbnail
Best of Generation 100

thumbnail
Best of Generation 150

Genetic Algorithm has a lot of real-world applications such as designing new medicines, generating a cipher for cryptographic purposes, and test cases generation. Another reason to learn and understand GA is because it’s simply fun!

Read More