I love functional languages. They are wonderfully powerful, and can easily abstract away a basic algorithm that one might apply to many kinds of problems. With this in mind, I decided to build a framework for constructing genetic algorithms in Clojure. While the definition of framework is surely debatable, what I mean by this (mostly) is that I built something I can reuse. For instance, the first thing I did with with my framework was optimize test suite subsets as part of a program repair technique — but as you will see, this is far from the only thing it can do! First, I planned out in pseudocode how I wanted my framework to run. For my readers unfamiliar with them, genetic algorithms apply an evolutionary model to complex problems, attempting to evolve solutions from what is initially a population of random states. But really, a genetic algorithm is just a process, and (at least for my purposes) it boils down to this:
(defn genetic-algorithm [some-params]
(initialize-population)
(loop (generations-that-remain)
(if (no-generations-left)
(print-stuff-and-exit)
(pass-again-to-loop
(mutate-population
(crossover-population
(select-population
(population))))))
Since Clojure is a functional language, we can pass problem-specific functions to this process, and it will give us a result. For instance, I might want to define my genetic algorithm such that it accepts functions for initializing population members, sorting them (fitness function), crossing them over, and finally, mutating them. Of course, it should also take parameters that convey setting information, like the number of generations to run, or the mutation rate. Here is what I came up with:
defn run-ga
[func-map setting-map]
(let [ipop ((:init-fn func-map) (:pop-sz setting-map))]
(loop [pop ipop
num (:gen setting-map)]
(if (zero? num)
(do
(println (first (sort-by (:fit-fn func-map) > pop ))))
(let [total-left (- (:pop-sz setting-map)
(:children setting-map))]
(do
(println (first (sort-by (:fit-fn func-map) > pop ))))
(recur
(concat
((:mut-fn func-map)
(do-crossover
((:sel-fn func-map)
pop
(:fit-fn func-map)
(* (:children setting-map) 2))
(:cross-fn func-map)
2)
(:mut-r setting-map))
((:init-fn func-map) total-left))
(dec num)))))))
It takes two maps as its parameters, func-map and setting-map. These serve as easy ways to pass in sets of functions and settings for a given problem. The algorithm follows the basic pseudocode layout I showed earlier, and if you want a more detailed description of its parameters, check it out at github. There is one function in run-ga — do-crossover — that is not passed as a parameter, but is rather defined elsewhere. What it does is take a list of parents, and run the problem-specific crossover function provided as a parameter to the algorithm. Its code is also on github, but I will display it here as well.
(defn do-crossover [p-list cross-fn num-parents] (map cross-fn (partition num-parents p-list)))
But now, lets move on to an example problem, and put the framework through its paces! I will pick a very easy problem — lets evolve “helloworld”. To begin, I will define some aspects of the problem space. The supposed DNA of “helloworld” is nothing but a list of characters:
(def dna (map str
['q 'w 'e 'r 't 'y 'u 'i 'o 'p 'a 's 'd 'f 'g 'h 'j 'k 'l
'z 'x 'c 'v 'b 'n 'm]))
I also wrote a few helper functions for the genetic algorithm framework, rand-from-list and rand-pop. These are generally useful for creating random populations. The first will return a “random” list from a list of elements, made up of the elements in that list. The second will return a list of these random lists. Here they are:
(defn rand-from-list
[lst num]
(let [total-el (count lst)]
(map (fn [x] (nth lst (rand-int total-el))) (range 0 num))))
(defn rand-pop
[lst num num-pop]
(map (fn [x] (rand-from-list lst num)) (range 0 num-pop)))
Now, lets apply them to dna. A function that initializes a random population of strings might look like this (we are limiting our population to strings of 10 characters):
(defn some-strings [num] (rand-pop dna 10 num))
This is actually not what I end up doing (in the func-map I end up using a partial) but it is effectively the same thing. On to crossover and mutation:
(defn list-crossover
[[s1 s2]]
(let [point (rand-int
(min (count s1)
(count s2)))]
(concat (take point s1)
(drop point s2))))
(defn generic-mutation
[list prob]
(map
(fn [s-list]
(map
(fn [test]
(if (> prob (rand-int 100))
(let [r-s (rand-int (count list))
r-t (rand-int (count (nth list r-s)))]
(nth (nth list r-s)
r-t))
test))
s-list))
list))
These are defined for you by the framework (optionally, if you want to use them). The function list-crossover chooses a cross-point, x, on two lists, and creates a new list made up of the first x elements of one list, and the last (length – x) elements of the other. Mutation was hard to make generic (and so it is very much imperfect), but it uses the base elements of other population members as an approximation for all the “genetic material” that can be swapped in or out of a mutated member. These functions work on most kinds of list populations. In many cases, however, you may need more complex behavior, so they are not hard-coded into the algorithm itself. Although nearly done, we still lack what is perhaps the most important part, the fitness function. This can be defined in many ways, but I use a system of one point for every directly matching character in the string. Like follows:
(defn hello-fitness
[lst]
(reduce
+
(map #(if (= %1 %2) 1 0)
lst
'("h" "e" "l" "l" "o" "w" "o" "r" "l" "d"))))
So, now we can define the maps:
(def func-map {:fit-fn hello-fitness :mut-fn generic-mutation
:sel-fn roulette-select :init-fn (partial rand-pop dna 10)
:cross-fn list-crossover})
(def set-map {:pop-sz 100 :children 50 :mut-r 1 :gen 100})
Note that roulette-select is another helper created for the framework. Its a simple function, and selects population members with a likelihood proportional to the fitness of each. For the curious, here it is:
(defn roulette-select
[pop fit-fn num]
(let [pop-fits (map fit-fn pop)
inc-fits (iterate (fn [[pfit idx]]
[(+ (nth pop-fits (+ idx 1)) pfit) (+ idx 1)])
[(first pop-fits) 0])
max-fitness (apply + pop-fits)
pick-one (fn [num] (second (first (drop-while #(< (first %) num)
inc-fits))))]
(map (fn [x] (nth pop (pick-one (rand-int max-fitness)))) (range num))))
Now, we are finished, so run the algorithm!
(run-ga func-map set-map)
Again, all this code (and some missing detail) is available at github. Hopefully you found this useful!
2 Trackbacks
[...] in parallel genetic programming, and so I decided to run my own. As has been mentioned, I am quite fond of functional languages, so for me, easy parallelism is simply a pleasant bonus. I do think its [...]
[...] This post was mentioned on Twitter by Ehud Lamm and Bio Computing, C. Pedro Gonçalves. C. Pedro Gonçalves said: RT @BioComputing: A Genetic Algorithm Framework in Clojure http://blog.ethanjfast.com/2009/09/ga-framework/ [...]