On Parallelism

Recently, I read an article describing someone’s experiments in parallel genetic programming, and so I decided to run my own. As has been mentioned, I am quite fond of functional languages; for me, easy parallelism is simply a pleasant bonus. I do think its worth noting, however, that while functional programming may be generally awesome, throwing in massive amounts of parallelism is often overkill (certainly when one runs his programs on a lowly Macbook).

Take Gajure, my unfeature-full framework for creating genetic algorithms. On a whim, I changed out most map functions with pmap, its parallel counterpart, and ran the example code included on github. This took literally two seconds — for instance, something like roulette-select was changed by two characters:

(defn roulette-select
  "Select num individuals from pop, with an individual's
   fitness porportional to selection likelihood."
  [pop fit-fn num]
  (let [pop-fits (pmap fit-fn pop)
        inc-fits (iterate (fn [[pfit idx]]
                          [(+ (nth pop-fits
                           (+ idx 1)) pfit) (+ idx 1)])
                          [(first pop-fits) 0])
        max-fitness (apply + pop-fits)
        pick-one (fn [num] (second (first (drop-while
                                         #(< (first %) num)
                                         inc-fits))))]
    (pmap (fn [x] (nth pop (pick-one (rand-int max-fitness))))
               (range num))))

On the face of things, a genetic algorithm would seem to be a perfect target for parallelism. Within a single generation, fitness computations and mutation operations can be run for each population member completely independent of its compatriots. Alas, such performance gains were not realized — not, at least, on my humble dual-core machine.

The bottom line: running time more than doubled for evolving “hello world.” What previously took 1450 ms takes 3162 ms with the new implementation. I will be gathering more representative statistics later on, but probably, the overhead involved in creating parallel mappings outweighs any benefits, given that I only have two processors (on that note, I’d be curious to know how much nicer things would run in Erlang, where the threads are more lightweight). Another contributing factor is simply the inane simplicity of the problem space; in an ideal case, each fitness evaluation would be doing a bit more work.

So in some sense, my dimwitted “hello world” example represents a worse case for the parallel implementation, particularly when run on my relatively underpowered machine. With that, I’m committing the pmap changes to github under a separate branch, here.

This entry was posted in Clojure and tagged , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

6 Comments

  1. Posted October 7, 2009 at 2:37 pm | Permalink

    Isn’t this because you’re using pmap wrongly? From the documentation:

    Like map, except f is applied in parallel. Semi-lazy in that the parallel computation stays ahead of the consumption, but doesn’t realize the entire result unless required. Only useful for computationally intensive functions where the time of f dominates the coordination overhead.

    I used pmap for raytracing and used partition to group the map into larger work units that are sufficiently computationally intensive to make the effort of using a thread worthwhile. Is is possible to try that for your example?

  2. Posted October 7, 2009 at 2:59 pm | Permalink

    I don’t know how much work the pick-one function really does — from my experience, it needs to be significant before pmap makes sense. Even then, your gains on two cores might be small: by default java will run your GC in a separate thread, so your Clojure program will get >100% CPU utilization (I’ve seen numbers like 137%). So there isn’t that much to be gained. That said, I have never seen a *loss* in performance yet.

    I’m glad you are measuring wall time. Too many people forget that this is the real metric.

  3. Posted October 7, 2009 at 3:43 pm | Permalink

    I’d say you might try lowering the number of threads you are starting so that it is more similar to the number of cores you are running them on. Using to few or too many threads can both easily result in worse performance than a single thread. Your experiment shows that you can’t blindly use parallelism for gain, not that you can’t use parallelism for gain.

    If you search “clojure pmap” you’ll find people in similar situations as yourself.

  4. Ethan
    Posted October 7, 2009 at 4:57 pm | Permalink

    @Everyone,

    Thanks for the comments!

    @Jeff specifically,

    Yes, I am using it “wrongly” — although I would prefer to look at it as inefficiently. I’m fairly certain that with some recoding, a parallel version of the framework could do much better. Naturally, that’s a bit harder than the naive “switch out map with pmap” approach… I simply wanted to see how well it would handle concurrency given minimal changes to the existing code structure.

  5. Posted October 7, 2009 at 5:43 pm | Permalink

    I don’t think threads are the big source of your overhead here. Using pmap sets up a sequence of futures and deferences them n at a time, where n is the number of processors plus two. I’m pretty sure creating the futures is what slows you down here, not spawning three threads.

    According to its docstring, pmap is for a compute-intensive function and a relatively small sequence. I’ve used it on a dual-core machine for running filters on a folder full of images and got about a 90% speedup.

  6. Posted October 8, 2009 at 12:38 am | Permalink

    I’ve been playing around with some parallel list comprehension strategies for a bit, and there are a bunch of reasons why it can be a difficult problem. The atomicity chosen for parallel execution is a fairly critical issue – too small and the overhead of the decomposition and scheduling heavily outweighs the benefits, too big and you don’t get much benefit from the parallelism.

    Additionally, choosing data-structures and algorithms that are parallel friendly is important, Guy Steele’s excellent talk on this gives some perspective on how far we need to go.

One Trackback

  1. [...] This post was mentioned on Twitter by :=a name. :=a name said: Clojure, Parallelism, and Genetic Algorithms http://bit.ly/4Gs4RV [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>