Optimization using Genetic Algorithm/Evolutionary Algorithm in Python using DEAP framework

with Bounds on each variable and a Constraint

Aviral Agarwal
10 min readApr 22, 2020

What’s optimized is good, right?

To set the record straight, when I say ‘optimization’, I mean finding optimum solutions to a given mathematical function. Ideally speaking, at these solutions/points, the value of the function should be maximum or minimum possible in the given search space(possible values of the variables/solutions) i.e. a global optima.

Optimization, an age old concept, continues to provide tremendous value and has only grown in importance as we collect more data. Most of the Machine Learning algorithms use these techniques to minimize their error/cost function to find the optimum solution. It also finds huge applications in Operations Research and is effectively being used to bring down costs and optimally use resources, from minimizing delivery times to efficiently utilizing allocated budget to maximize results.

Here, we are going to implement a variation of Evolutionary Algorithms called Genetic Algorithm (GA), step by step and with explanation, using the DEAP framework in Python.We will also apply constraints on our variables using the concept of penalty, also using DEAP.

Evolutionary Algorithms(EAs)

Photo by Vidar Nordli-Mathisen on Unsplash

EAs are global optimization algorithms problems inspired by the natural/biological evolution by natural selection such as reproduction/recombination/crossover, mutation, selection[4], similar to survival of fittest in real life. This makes them very intuitive to understand. They come under the larger umbrella term of Metaheuristic Algorithms or simply Metaheuristics.

We are going to implement Genetic Algorithm and the following basic steps should hopefully provide enough clarity to move forward:

  1. GA initially starts with randomly selected solutions (or individuals, all of them combined make up the population) which are randomly spread all across the function, instead of just one random starting point like in neighborhood (eg: Hill Climbing) or gradient based (eg: Gradient Descent) search algorithms.
  2. Fitness or fitness values of the individuals or solutions are then calculated.
  3. Best or Fittest individuals are then selected using a selection strategy based on fitness value (read “Survival of Fittest”).
  4. Selected individuals from population then produce offspring (or children) at the end of each generation (or iteration), thus passing their fit genes to their children. The process is referred to as Reproduction or Crossover or Recombination.
  5. Offspring are similar to parents and suffer from random mutations (based on a probability), much like what we see in real life between parents and their children.
  6. Offspring from previous generation become population ( or parents) for next generation.
  7. The process continues for the given number of generations or iterations.
  8. Fittest individual is recorded at the end of all generations, the fittest of them all is the optimum solution.

In contrast to neighborhood-base optimization algorithms such as Hill Climbing, EAs are population based and rely on stochastic processes.

EAs work well real life problems which are, most of the time, complex NP problems. This approach also lets EAs escape the drawback of getting stuck in local optima and are most likely to converge at Globally Optimal or Near-Globally Optimal solution.

DEAP: Distributed Evolutionary Algorithms in Python

DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas[1].

Although DEAP has some algorithms implemented as functions, its main power lies in rich built in functions for basic process in EAs (such as for crossover, mutation) and coupled with Python, it provides us the flexibility to implement many variations of EAs to suit the problem at hand, quickly and efficiently.

DEAP seeks to make algorithms explicit and data structures transparent. It works in perfect harmony with parallelization mechanism such as multiprocessing and SCOOP[1]. This becomes valuable when resource and time performance are highly valuable.

Just a brush up on OOPS and classes-inheritance concepts would unable deeper understanding of the code. We will not be using any built in algorithm to solve our problem, but use the framework to implement GA.

Last but not the least, DEAP is present in PyPI repository and can be easily installed using pip.

pip install deap

It is noteworthy to add that DEAP has very extensive documentation which I found very helpful[1].

The Problem: Himmelblau’s Function

There are many functions which can be considered as standard optimization problems. Like Traveling Salesman Problem which is a combinatorial problem where we need to find the best route for the salesman to cover all destinations while traveling minimum distance. We can model any problem that we are facing in our business or real life and try to optimize them. The results might be interesting!.

Here, we will minimize the following continuous function called Himmelblau’s Function.

f(x, y) = (x² + y — 11)² + (x + y² — 7)²

We are going to bound or limit values of both variables x and y between

[-6, 6]. Bounds can be different on both.

Within this range of x and y, we have the following maximum at

f(-0.270845, -0.923039) = 181.617

Himmelblau’s Function in 3D, courtesy Wikipedia[4]

and following four identical minimums,

f(3.0, 2.0) = 0.0
f(-2.805118, 3.131312) = 0.0
f(-3.779310, -3.283186) = 0.0
f(3.584428, -1.848126) = 0.0

Evolutionary algorithms are usually unconstrained optimization procedures[2].

Since, constrained optimization is a more real scenario, we are also going to put a constraint on the variables such that their sum should be less than zero

x + y < 0

This leaves us the expected optimal values for x and y at

x = -3.779310 and y = -3.283186 and f(x, y) = 0.0

in the given feasible solution space ( or values of variables x and y which do not violate any bounds or constraints).

Python Implementation using DEAP

Hyperparameters

Parameters, which we may need to vary and decide as per our problem., are all initialized at the very beginning. It’s okay if not all of them make sense right now, it will be clearer when we see these in action. The comments in the code should help in that respect too.

Imports

We will use matplotlib for plotting the convergence of GA, numpy to find basic statistics for each generation, for tracking purposes. The base, creator and tools are three modules from DEAP contain functions and classes which we are going to use to implement GA.

Creating an Individual Class

Here is a convenient way of creating an Individual class using DEAP.

Before we create individuals, each individual needs to have a fitness value for which we will define the class FitnessMin. It will inherit the Fitness class of the deap.base module and contains an attribute called weights.

Please mind the value of weights to be the tuple as this is where we define whether we are going to maximize or minimize

(1.0,) +ve weight for maximization or (-1.0,) -ve weight for minimization

and whether we have multi-objective optimization. Since we have only one objective function for which we want to find optimal solutions, we have only one element in our weights tuple.

Next, we will create the class Individual, which will inherit the class list and contain our previously defined FitnessMin class in its fitness attribute.

Note that upon creation, all our defined Individual class will be part of the creator container and can be called directly. The arguments include, first the desired name of the new class (and object of which will represent an individual). Second, the base class it will inherit (here list), in addition any subsequent arguments you want to become attributes of your class (here fitness). This is a general approach that we will see a lot in DEAP.

base.Toolbox

Now, we will use our custom classes to create types representing our individuals as well as our whole population[5].

All the objects we will use on our way, an individual, the population, as well as all functions, operators, and arguments will be stored in a DEAP container called Toolbox. It contains two methods for adding and removing content, register() and unregister()[5].

The registration of the tools to the toolbox only associates aliases to the already existing functions and freezes part of their arguments.
This allows us to fix an arbitrary amount of argument at certain values so we only have to specify the remaining ones when calling the method.

when toolbox.individual will be called, tools.initRepeat(creator.Individual, toolbox.attr_bool, 100) will be executed.

When called, the individual() method will thus return an individual initialized with what would be returned by calling the attr_bool() method 100 times. and instance of creator.Individual container will be filled using the method attr_bool(), provided as second argument, and will contain 100 integers, as specified using the third argument.

An Individual or a solution with Genotype to Phenotype conversion(implemented in decode_all_x function)[0]

population() method uses the same paradigm i.e. it will return a ‘list’ filled with toolbox.individual, but we don’t fix the number of individuals that it should contain, we will fix that later.

Objective/Evaluation Function

We will now define our objective function. We will also see how we are going to decode the individual in genotype (in bit form) to our actual solutions in decimal number system (phenotype) from using precision formula.

Please see the decode_all_x, to see exact logic of decoding or converting and individual (or a solution) from genotype to phenotype. There can be multiple ways this can implemented, depending upon requirement.

Note that, the objective function takes in only an individual and must return an iterable of a length equal to the number of objectives (same as no of elements in weights).

Functions to impose penalty in case constraints are violated

In DEAP, we require two functions to impose penalty on fitness of individuals which violate the constraint. They both take only an individual as input.

check_feasiblity returns a boolean False, if constraint is violated, True otherwise. The penalty is imposed on individual only if False is returned.

penalty_fxn is the penalty function which returns a penalty value which is added to fitness ( for minimization) as this will make the individual less fit (high fitness value is less fit for minimization problems, opposite for maximization). Generally, the penalty is proportional to the extent by which the individual violates the constraint, this is what we have done in the code too.

Registering everything in Toolbox container, the DEAP way

tools.DeltaPenalty will penalize the fitness of an individual if check_feasiblity returns False. To penalize, it will add (for minimization) constant 1000 and whatever value penalty_fxn returns to the actual fitness of the individual

Mating/Reproduction/Crossover/Recombination: tools.cxTwoPoint will take two individuals and produce two children. It will break the both individuals, in genotype form, at two random points to form three segments for each individual. Both individuals are broken at same two points. Alternate segments are taken from each parent individual and joined together to result in two children. A small illustration of the process for individuals of genotype size 7.

Mating of two Parents to produce two children

Tournament Selection is the selection strategy we will use to select fit individuals to be parents and mate to produce children. This is implemented in tools.selTournament. We randomly select tournsize (k) number of individuals from our population and select best individual from those k individuals. We do this enough number of times to create children/offspring equal to population.

Mutation: Each gene/allele/bit of a child is susceptible to mutation. We pick a random number between 0 and 1 and if the random number is less than the probability of mutation indpb, then the bit is flipped. This probability is generally kept low. We will use tools.mutFlipBit to implement this.

Please note that how the above processes are executed can be changed to suit the problem at hand. We can use many built-in functions from DEAP or create our own, hence the flexibility.

Hall of Fame: tools.HallOfFame()

hall_of_fame = tools.HallOfFame(1)

This a very convenient way to find the best individual out of all the populations and offspring encountered, which is nothing but our optimal solution.

The argument ‘1’ signifies that we want to track only one best individual out of the population.

If we use update method of HallOfFame object with a population, then it will select and store best individual from it. If we put one more population in update method, it will again choose the best individual and compare it with earlier stored individual and keep the best out of both. In the end we will get the best individual which will go into ‘Hall Of Fame’.

Logbook and Statistics

We will use tools.Statistics() to compute some basic statistics about individuals or the solutions in each generation.

And will also use tools.Logbook() to store the statistics generated and use it for plotting to see how the algorithm converges.

Finally Implementing Genetic Algorithm

This how we use all of the above and execute the GA described earlier to solve the problem. Comments should help provide required usage and clarity about the code.

Please observe how DEAP is providing us ways to manipulate or make changes in the algorithm as per our choices or problem.

Results and Plotting

We will use matplotlib to plot minimum values reached in each generation to see how the algorithm converges.

Final Solution and Convergence of GA

As you can see, the solution is near expected optimal and has not violated the constraint too.

Minimum value for function:  0.01101131654241808
Optimum Solution: [-3.7879641021042314, -3.298504163793028]

The graph shows that although, the GA ran for 1000 generations, the optimal result was found within 200 iterations. This can be tuned. Also note that the optimal solution of the last generation (at convergence) need not be the best solution. The graph also shows how the GA is able to mimic random stochastic processes to solve the problem.

That’s it, thanks for reading and I hope you found this useful. Do check out the references for more details. I found them very helpful.

--

--