PROBLEMS AND OPTIMIZATION
In this section we talk about the problems and their types. Later on we also move on to discuss about different types of algorithms that can be used to produce an optimal solution for the same.
Generally problems are divided into 4 groups:
3. NP – Complete
4. NP – Hard
5. NP – Easy
6. NP – Equivalent
7. NP – Intermediate
Our main concern for this project is optimizing NP- Hard or NP – Complete problems. So we’ll discuss about these two types of problems below.
NP-HARD AND NP-COMPLETE PROBLEMS:
NP-hard (non-deterministic polynomial-time hard) problems are “those problems that are, considered to at least as difficult to solve as the hardest problems in NP. So basically a problem H is NP-hard when every problem L in NP can be reduced or brought down to H in polynomial time. If we are able to find a polynomial algorithm to solve any NP-hard problem then it would also be useful for all the problems in NP which is close to being impossible since most problems in NP are considered hard.
An example of an NP-hard problem is the decision subset sum problem, which is this: given a set of integers, does any non-empty subset of them add up to zero? This is a decision problem, and also happens to be NP-complete. Another example of this is the optimization problem of finding a cyclic route through all nodes of a weighted graph and also achieving the least cost at the same time. This is commonly known as the traveling salesman problem”. NP-hard problems often turn out to be present in areas such as:
• Data mining
• Rosters or schedules
• Decision support
• Routing/vehicle routing
“A decision problem which is both in NP and NP-hard can be said to be NP -Complete. The set of NP-complete problems is often denoted by NP-C or NPC. Even if any given solution to an NP-complete problem can be verified quickly within polynomial time, there is no fixed or deterministic way by which we can be sure of getting the globally optimal solution i.e. the best solution. Therefore, the time required to solve the problem using any known algorithm increases quickly as the size of the given problem grows”.
So basically a decision problem C is NP-complete if:
1. “C is in NP”, and
2. “Every problem in NP is reducible to C in polynomial time”.
Some examples of these problems are the Knapsack problem and the Travelling Salesman problem. The Travelling Salesman Problem is described below.
Travelling Salesman Problem
The Travelling Salesman Problem is a problem where we have a salesman who has to travel between N number of cities. The order in which he visits all these cities is not very important for him as long as he gets to visit each city during his trip, and finishes where he started his journey. Each city is connected to other close by cities, or nodes by links each of which has one or more weights (or the cost) attached. The cost describes exactly how “hard” it is to traverse this edge on the graph. The pragmatic salesman is keen on keeping both the travel costs and the distance he travels to a minimum.
The Traveling Salesman Problem is a type of a large class of “difficult/hard” optimization problems that have interested mathematicians and computer scientists for decades. This algorithm has immense use in the field of science and engineering. This is the problem that I have considered for optimization. The results obtained after optimizing the same have been shown in the ‘conclusions’ section.
TECHNIQUES FOR OPTIMIZATION:
“The following techniques can be applied to solve computational problems in general, and they often give rise to substantially faster algorithms:
• Approximation: Instead of searching for the most optimal solution, we search for the closest to optimal or “almost” optimal one.
• Randomization: Use randomness to achieve a quicker average running time, and allow the algorithm to be unsuccessful with some small probability.
• Restriction: By limiting the structure of the input to be given, faster algorithms are generally possible.
• Parameterization: There are often faster algorithms if certain parameters of the input are constant or fixed.
• Heuristic: These are the ones which reasonably well in certain cases but there’s no guarantee that the algorithm would produce optimal results or consume less time for other problems”.
Metaheuristics is a term often used to describe a major and primary subfield of stochastic optimization. Stochastic optimization is a general class of algorithms and techniques which use some degree of randomness to find optimal (or closest to optimal) solutions to hard problems (NP – hard and NP –complete have been discussed in the next section). Metaheuristics are the most general of these kinds of algorithms, and are applied to a very wide range of problems. Metaheuristics are applied to I know it when I see it problems. These are problems for which brute-force is totally out of the question and we have very little to work on. So, in case if we are provided with a temporary solution, we can at least work on it, test it to see whether it is optimal for solving the concerned problem or not.
For this project, we’ve gone through a number of metaheuristic algorithms and the pros and cons associated with each of these algorithms. And most of the algorithms that we have studied about have been written about in the LITERATURE SURVEY section. So basically what we’ve learnt after all this study is that optimization is ubiquitous, forming a part of different fields such as engineering design and economics and being useful in trifles such as holiday planning to hefty tasks such as Internet routing. As money, resources and time are available in a limit as well and hence, optimization when it comes to resource utilization and time consumption is rather important.
Metaheuristic optimization deals with optimization problems using metaheuristic algorithms. “In the simplest sense, an optimization can be considered as a minimization or maximization problem. For example, the function f(x)=x2 has a minimum fmin=0 at x=0 in the whole domain −∞<x<∞ . In general, if a function is simple enough, we can use the first derivative f′(x)=0 to determine the potential locations, and use the second derivative f′′(x) to verify if the solution is a maximum or minimum. However, for nonlinear, multimodal, multivariate functions, this is not an easy task. In addition, some functions may have discontinuities, and thus derivative information is not easy to obtain. This may pose various challenges to many traditional methods such as hill-climbing”.In general, an optimization problem can be written as:
where f1,…,fI are the objectives, while hj and gk are the equality and inequality constraints, respectively. “In the case when I=1.0, it is termed as single-objective optimization. But if I is equal to or more than 2 , it turns into a multi-objective problem whose solution strategy is different from those for a single objective. Metaheuristic optimization is more about generalized, nonlinear optimization problems rather than linear problems. Obviously, the simplest case of optimization is unconstrained function optimization”. For example, multi-modal test functions are used for validating new optimization techniques or algorithms, and a good example of the same is Ackley’s function which has a global minimum fmin=0 at (0,0).
Figure 9: Ackley’s multimodal function
CLASSIFICATION OF METAHEURISTIC ALGORITHMS
To solve the above mentioned optimization problem, good optimization algorithms are required. These algorithms can be further classified on the basis of their characteristics and focus.
If the derivative/gradient of an objective function is the focus i.e. the basis is focus, then optimization algorithms can be divided into gradient-based algorithms and derivative-free or gradient-free algorithms. A gradient-based algorithm such as hill-climbing makes use of derivative information, and they are quite efficient. Derivative-free algorithms don’t make any use of any derivative information but the values of the objective function itself. In case of any discontinuities present in the function or an excess in cost to calculate derivatives accurately, derivative-free algorithms such as Nelder-Mead downhill simplex turn out to be very useful.
From a different perspective, optimization algorithms can be classified into trajectory-based and population-based. A trajectory-based algorithm typically uses one agent or a solution at a time, which will find out a path as the iterations or repetitions continue. Hill-climbing is a trajectory-based algorithm, and it links or attaches the starting point with the final point via a zig-zag path. Another important example of the same is simulated annealing which is a popular metaheuristic algorithm. While population-based algorithms like particle swarm optimization (PSO) use multiple agents or solutions which will interact/communicate and find out multiple paths.
Optimization algorithms can also be classified as deterministic or stochastic. If an algorithm works in a deterministic and mechanical manner without any sort of randomness in it, it is said to be deterministic. For such algorithms, they will reach the same final solution if we start with the same initial point. Some examples of deterministic algorithms are Hill-climbing and downhill simplex. While if there is some randomness in the algorithm, the algorithm will generally reach a different point every time the algorithm is run despite the use of the same initial point. Some examples of stochastic algorithms are genetic algorithms and particle swarm optimization.
Search capability can also be a factor for classifying these algorithms. In such a case algorithms can be divided into local and global search algorithms. Local search algorithms typically converge/ reach towards local optima which are often not the global optimum, and such an algorithm is mostly deterministic and has zero ability to escape from the local optima. Simple hill-climbing is an example of this. But for global optimization, local search algorithms are not meant to be used, and hence, global search algorithms should be used which are not stuck inside the local optima and can produce a globally optimal result. Modern algorithms are mostly meant for global usage but turn out to be inefficient most of the times.
Some of the most popular metaheuristic algorithms are as follows:
1. Simulated Annealing
Simulated annealing is based on the phenomenon known as metal annealing. The main advantage of simulated annealing is its capability of avoiding being trapped in the local optima unlike the gradient based algorithms.
In this process, the actual search moves trace a piece-wise path where with each move, an acceptance probability is evaluated, which accepts both changes – the ones that improve the objective function and the ones that do not improve the objective. “The acceptance probability p is given by the formula:
where kB is the Boltzmann’s constant, T is the temperature for controlling the annealing process and ΔE is the change in energy. The change in the objective function, Δf , can be related to ΔE in the following way
where γ is a real constant (typically, γ=1 for simplicity).
Simulated annealing puts an adequate amount of randomness into stuff so as to be able to escape local maxima early in the process without getting late in the process, when a solution is nearby. This makes it pretty amazing at tracking down a decent solution irrespective of its starting point.
2. Genetic Algorithms
Genetic algorithms (GAs) are possibly the most popular evolutionary algorithms with a diverse range of applications. These are heuristic search algorithms based on the evolutionary ideas or notion of natural selection as well as genetics. As such they represent an intelligent and practical exploitation of a random search which is being put to use to solve optimization problems. But GAs are by no means random, instead they use up historical information in order to direct the search into the region of better results within the search space. The basic techniques or methods of the GAs are designed in manner so as to simulate processes in natural systems essential for evolution.
A large number of well-known optimization problems have been solved by GAs. In addition, GAs are population-based and many modern evolutionary algorithms are either based on or have large similarities to GAs.
GAs, developed by John Holland in the latter half of 20th century, are models or abstractions that depict biological evolution based on “Charles Darwin’s theory of natural selection”. Holland was the first to use operators such as crossover, mutation, recombination, and selection in the study of artificial and adaptive systems. These genetic operators are the most important components of GAs as a problem-solving strategy. The process used in GAs involves the encoding of solutions as arrays of bits or character strings (chromosomes), the manipulation or change in these strings by making use of genetic operators and a selection, based on their fitness value to search for a solution to a particular problem. This is often done through the following procedure:
1) Define an encoding scheme;
2) Define a fitness function or selection criterion (objective function);
3) Create a population of chromosomes;
4) Evaluate the fitness or find out the fitness value of every chromosome in the population;
5) Create a new population by performing FPS (fitness-proportionate selection), and using operators such as crossover and mutation;
6) Replace the old population by the new one.
Steps 4), 5) and 6) are then repeated for a number of generations.
At the end, the most optimal chromosome is decoded to get a solution to the problem.
Each iteration, which leads to a new population, is called a generation.
3. Ant Colony Optimization
Ant colony optimization was proposed in the year 1992 by Marco Dorigo and is basically based on the foraging behaviour of social ants. Many insects such as ants make the use of “pheromones as a chemical messenger”. Ants are social insects and millions of them live together in organized colonies. When foraging, a swarm of ants interact in their local environment. Each ant lays pheromones or scent chemicals to communicate with the others. Each ant is also capable of following the route marked with pheromones laid by the others. When an ant finds a food source (solution), it will mark it with the chemical and also mark the trail or route to and from it.
However, the pheromone concentration ϕ decays or evaporates at a constant rate γ.
Where ϕ0 is the initial concentration at t=0. Here the evaporation is rather important since it ensures the possibility of convergence and self-organization”.
From the initial foraging route which was chosen randomly, the pheromone concentration differs from route to route and the ants follow the routes with a higher concentration of pheromones and the pheromone is also enhanced by the increasing number of ants. A path becomes the favored path once more and more ants pass through it. Thus, some favorite routes emerge, often the shortest or more efficient ones. This is actually a +ve feedback mechanism. As the system evolves, it converges or reaches to a self-organized state.
Since the ant-colony works on a very dynamic system, the ant colony algorithm can be of immense use in graphs with changing topologies. “Examples of such systems are computer networks and artificial intelligence simulations of workers”.
4. Bee Algorithms
Bee algorithms are a collection of metaheuristic algorithms, based on the foraging behaviour of bees. A few variants of a similar kind exist such as honeybee algorithm, ABC (artificial bee colony), virtual bee algorithm and honeybee mating algorithms.
Honey bees stay in a colony and they forage and store honey in the colony they had constructed. Honey bees can “interact using chemicals known as pheromones and `waggle dance’. For example, an alarming bee may release a chemical message a.k.a. pheromone to stimulate attack response in other bees. When bees find a good food source (optimal solution) and bring some nectar or honey back to the hive, they will send the location of the source by using the waggle dance for signaling reasons”.
Bee algorithms have been put to use in a large number of fields such as:
• Optimisation of clustering systems or classifiers
• Manufacturing process
• Multi-objective optimisation
The pseudo code for the same is as follows:
Fig 10: Bees algorithm pseudo-code
• ns – Number of scout bees
• ne – Number of elite sites
• nre – Number of elite site foragers
• nrb – Number of remaining best sites
• nb – Number of best sites
The total artificial bee colony size is “n=(ne•nre)+((nb-ne)•nrb)+ns” (elite sites foragers + remaining best sites foragers + scouts) bees.
5. Particle Swarm Optimization
Particle swarm optimization (PSO) is “a population based stochastic optimization technique developed by Dr. Eberhart and Dr. Kennedy in 1995, which is inspired by social behavior of bird flocking or fish schooling. Since then, PSO has produced a lot of attention and now forms an exciting research subject in the domain of swarm intelligence. PSO has been applied to almost every field in optimization, design or scheduling applications and computational intelligence”.
PSO searches “the solution space of an objective function by adjusting the trajectories of individual agents, known as particles which trace a piece-wise path that can be shown as a time-dependent positional vector”. The movement of a swarming particle consists of 2 components: a stochastic component and a deterministic component. Each particle is allured towards the position of the current global optimum g∗ and its own best known location x∗i.
When a particle encounters a location that is better than any the previous locations, then it updates this location to be the new current best for the particle i. There is a current best position for each particle at any time t for each iteration. The aim is to look for the global best among all the current best solutions until after a certain number of iterations or the objective no longer improves.
Let xi and vi be the position vector and velocity, respectively, of particle i . The new velocity vector is determined by the following formula
where ϵ1 and ϵ2 are two random vectors and each entry takes a value between 0 and 1. The parameters α and β are the acceleration constants, which have a value, α≈β≈2.
The initial locations of all particles should be distributed uniformly compared to each other so that they can sample over most regions, which is essential for multimodal problems. The initial velocity of a particle is usually set to zero. The new position can then be changed or updated by the formula
As the iterations move on, the particle system swarms and may finally reach a global optimum.
6. Tabu Search
Tabu search was published in 1997 but was proposed much earlier by Fred Glover in the 1970s. Tabu search makes explicit use of memory and the search history is an important component of the process. As most of the algorithms are memory-less or only use results of the last few steps, it is initially difficult to see any advantages related to using the search history. The precision of memory and history could introduce too many degrees of freedom, and a mathematical analysis of the algorithm behaviour becomes intractable.
Tabu search remains one of the most popular and successful metaheuristics in optimization. Tabu search can be considered to be an intensive local search, and the appropriate use of search history avoids revisiting local solutions by recording the already-tried solutions in TLs (Tabu Lists). The TLs could save a lot of computing time, leading to significant improvements in search efficiency over the iterations.
The pseudocode foe the Tabu Search algorithm is as follows:
Fig 11: Tabu Search algorithm pseudocode
7. Harmony Search
Harmony Search (HS) is a comparatively new heuristic optimization algorithm developed in the year 2001. Harmony search is basically inspired by the improvisation process of a musician. When a musician is improvising, he/she has three possible choices:
(1) Play a famous piece of music which is nothing but a series of pitches in harmony
(2) Play something similar to a known piece causing slight adjustments in the pitch, or
(3) Compose new or random notes (Pitches not in harmony initially and are random).
On formalization of the three options, we have three components: usage of harmony memory, pitch adjustment, and randomization.
From a Markov chain POV, pitch adjustment is nothing but a random walk which produces a new solution from xold(current solution) by
where eti is a random value drawn from a uniform distribution [−1,1] and bp is the bandwidth, which holds and controls the local range of pitch adjustments.
8. Firefly Algorithm
The Firefly Algorithm (FA) was developed rather recently in the year 2008 by Yang and is inspired by the flashing patterns and behaviour of fireflies. FA uses 3 basic rules in general. They are:
• Fireflies are unisex which means that one firefly will be tempted to interact with other fireflies irrespective of their gender.
• The attractiveness is directly proportional to the brightness as a result of which both decrease as the distance between two interacting fireflies increases. Hence for any 2 flashing fireflies, the brighter firefly holds the ability to attract the other one. But in the absence of a brighter one, a random move is performed.
• The brightness of a firefly is obtained by using the landscape of the objective function.
As the attractiveness of a firefly is directly proportional to the light intensity seen by nearby fireflies, we can now define the variation/ change in attractiveness β w.r.t. the distance r by
where β0 is the default attractiveness at r=0 . The movement of a firefly I which is attracted to a brighter firefly j which lies in its vicinity, is determined by the formula
where the second term is because of attraction. The third term is a randomization with α being the parameter for the same and eti is a vector of random numerical values drawn from a Gaussian or uniform distribution at time = t .If β0=0 , it becomes a simple random walk. The randomization eti can also easily be used for other distributions like Levy flights.
9. Cuckoo Search
Cuckoo search (CS), being developed in the year 2009 by Xin-She Yang and Suash Deb is one of the most recent nature-inspired metaheuristic algorithms and it is inspired by the brood parasitism of some species of the cuckoo bird. Plus this algorithm is also enhanced by the distribution called Lévy flights instead of simple isotropic random walks. According to a recent survey, it was found that CS is more efficient in comparison to PSO and genetic algorithms.
Cuckoos are birds which are not only popular for producing beautiful sounds but also because of the aggressive reproduction strategy used by certain species. Some species like ani and Guira lay their own eggs in communal nests and to increase the hatching probability of their eggs, they may remove others’ eggs. The process followed in the same is given as follows:
• Each cuckoo lays 1 egg at a time in a randomly selected nest;
• The nests with high-quality eggs (optimal solutions) i.e. the best nests are carried over to the next generations;
• The number of host nests that are available is constant/fixed, and the egg laid by a cuckoo is found by the host bird with a probability pa∈[0,1] , depending on the appearance of a cuckoo egg i.e. whether it looks similar to its host eggs. In such a case, the host bird can either discard the egg, or the nest and build a completely new nest.
When generating new solutions x(t+1) for, say, a cuckoo i, a Lévy flight is performed
where α>0 is the step size which should be scaled to the problem of interest. The above equation is essentially the stochastic equation for a random walk. In general, a random walk is a Markov chain whose next status/location only depends on the current location (the 1st term of the equation) and the transition probability (the 2nd term). But in this case the random walk via Lévy flight is more efficient in exploring the search space, as its step length is much longer in the long run.
The Lévy flight provides a random walk whose random step length/s is drawn from a particular Lévy distribution
which has an infinite value of variance and infinite mean as well.
Fig 12: Cuckoo search algorithm pseudo-code
For my minor project I had considered the NP – hard problem called Traveling Salesman Problem (TSP) and the metaheuristic algorithm selected to optimize the same is Tabu Search algorithm. The following images are the outputs for the following inputs:
Consider 5 cities A,B,C,D and E that are to be visited by the salesman and the cost matrix is:
Cities A B C D E
A 0 1 2 3 4
B 1 0 6 7 8
C 2 6 0 3 1
D 3 7 3 0 4
E 4 8 1 4 0
Fig 13: Distance matrix for first 5 cities
The most optimal solution for visiting these cities with the least cost spent is:
0 1 3 4 2 0 (Cost: 15)
Where 0 – A, 1 – B, 2 – C, 3 – D, 4 – E.
The output of the above input comes out to be:
Fig 14: Output obtained after taking the first distance matrix as input
Similarly if the input was,
Cities A B C D E
A 0 2 4 6 8
B 2 0 6 9 5
C 4 6 0 4 7
D 6 9 4 0 2
E 8 5 7 2 0
Fig 15: Distance matrix for next 5 cities
The most optimal solution for visiting these cities with the least cost spent is:
0 1 4 3 2 0 (Cost: 17)
Where 0 – A, 1 – B, 2 – C, 3 – D, 4 – E.
The output of the above input comes out to be:
Fig 16: Output obtained after taking the next (2nd) distance matrix as input
 S. Arora, “Polynomial time approximation schemes for euclidean traveling salesman and other geometric problems”, Journal of the ACM (JACM) 45 (1998) 753–782.
 J. K. Lenstra, A. H. G. RinnooyKan, “Some simple applications of the travelling salesman problem”, Operational Research Quarterly (1975) 717–733.
 G. Reinelt, “The traveling salesman: computational solutions for TSP applications”, Springer-Verlag, 1994.
 C. Blum, A. Roli, “Metaheuristics in combinatorial optimization: Overview and conceptual comparison”, ACM Computing Surveys (CSUR) 35 (2003) 268–308.
 F. Glover, G. A. Kochenberger, “Handbook of metaheuristics”, Springer, 2003.
 J. Kennedy, R. Eberhart, “Particle swarm optimization, in: Neural Networks”, 1995. Proceedings., IEEE International Conference on, volume 4, IEEE, pp. 1942–1948.
 A. Mucherino, O. Seref, “Monkey search: a novel metaheuristicsearch for global optimization, in Data Mining, Systems Analysis, and Optimization in Biomedicine’’ (AIP Conference Proceedings Volume 953), volume 953, American Institute of Physics, pp. 162– 173.
 D. Teodorovic, P. Lucic, G. Markovic, M. D. Orco, “Bee colony optimization: principles and applications, in Neural Network Applications in Electrical Engineering”, 2006. NEUREL 2006. 8th Seminar on, IEEE, pp. 151–156.
H. Shah-Hosseini, “The intelligent water drops algorithm: a nature-inspired swarmbased optimization algorithm”, International Journal of Bio-Inspired Computation 1 (2009) 71–79.
 X. S. Yang, “Fireﬂy algorithms for multimodal optimization”, Stochastic algorithms: foundations and applications (2009) 169–178.
Z. W. Geem, J. H. Kim, et al., “A new heuristic optimization algorithm: harmony search”, Simulation 76 (2001) 60–68.
 X. S. Yang, S. Deb, “Cuckoo search via le´vy ﬂights”, Nature & Biologically Inspired Computing, 2009. NaBIC 2009. World Congress on, IEEE, pp. 210–214.
 E. D. Taillard, L. M. Gambardella, M. Gendreau, J. Y. Potvin, “Adaptive memory programming: A uniﬁed view of metaheuristics”, European Journal of Operational Research 135 (2001) 1–16.
 D. H. Wolpert, W. G. Macready, “No free lunch theorems for optimization”, Evolutionary Computation, IEEE Transactions on 1 (1997) 67–82.
 E. Bonomi, J. L. Lutton, “The n-city travelling salesman problem: Statistical mechanics and the metropolis algorithm”, SIAM review (1984) 551–568.
 M. Malek, M. Guruswamy, M. Pandya, H. Owens, “Serial and parallel simulated annealing and tabu search algorithms for the traveling salesman problem”, Annals of Operations Research 21 (1989) 59–84.
 P. Larranaga, C. M. H. Kuijpers, R. H. Murga, I. Inza, S. Dizdarevic, “Genetic algorithms for the travelling salesman problem: A review of representations and operators”, Artiﬁcial Intelligence Review 13 (1999) 129–170.
 M. Dorigo, L. M. Gambardella,“Ant colonies for the travelling salesman problem”, BioSystems 43 (1997) 73–82.
 S. M. Chen, C. Y. Chien, “Solving the traveling salesman problem based on thegenetic simulated annealing ant colony system with particleswarm optimization techniques”, Expert Systems with Applications (2011).
 ] X. H. Shi, Y. C. Liang, H. P. Lee, C. Lu, Q. X. Wang, “Particle swarm optimizationbased algorithms for tsp and generalized tsp”, Information Processing Letters 103 (2007) 169–176.
 R. F. Abdel-Kader, “Fuzzy Particle Swarm Optimization with Simulated Annealing and Neighborhood Information Communication for Solving TSP”, International Journal of Advanced Computer Science and Applications 2 (2011).
 G. A. Croes, “A method for solving traveling-salesman problems”, Operations Research (1958) 791–812.
...(download the rest of the essay above)