How I develop on OSX

It recently occurred to me that I’m dumb. I certainly don’t mean this in any pejorative sense (after all, that would be abrasive to the ego), but rather I would suggest it as regards a behavioral pattern that I tend to follow. Roughly, said pattern goes like this:

  1. Identify problem
  2. On the grounds of theoretical purity, construct needlessly complicated solution.
  3. Implement solution.

For instance, a few days ago I wanted to put together a small rails application. There was nothing earth shattering in this; I simply wanted to get a quick prototype up and running. So I update my gems, do a quick `rails prototype/` command, and boot up the empty app. It fails.

Ok, so I do the natural thing; I google the error message or whatnot; I search for a few minutes. Well, it has something to do with Snow Leopard, that much becomes clear. This begs the question: does any bug ever not have something to do with Snow Leopard? But I won’t adress that here. Anyhow, there is an obscure solution, under which I’d have to edit manually my rails installation — something like that, anyway.  But I’m not going to do that. I try some earlier versions of rails and it’s dependancies — they fail for some other reasons.

I’m sure that I could have figured it out, had I dug around for a while. I might have, with work, discovered the vagaries behind why the same programs, with the same (nominally) installed dependancies exhibited opposite behaviors on different machines. But no, by next month I’d probably just have a new and similar problem. So I do just what any irrational fellow ought to; I decide to install a virtual Arch distro (my favorite) with VMware fusion, mount the machine as a disk via ssh, and develop from there. At least in Arch, I will know exactly what is installed, where it is, and how it ought to be working. Basically, I eliminate the free variable of imprecise ignorance regarding everything that’s on the machine, and whether I’ve hacked it up manually. Of course, god forbid that I give up TextMate and actually code directly on a linux machine. Emacs is awesome for my more lispy projects, but I like TM for rails.

To be clear, the blame here is almost entirely on me. However, given the recent announcement of  the iPad, I feel that it’s become the custom to lay down some hate. So here goes: Apple, you ought to implement a decent package management scheme. Some inconsistency with something I had installed (or worse, you put there) was messing things up. Sure, one might say that I use pacman (or yum, or insert-reader-favorite) as a crutch, and I’m fine with that. In particular, said management helps prevent people like me — overly exuberant installers and occasional hackers of who-knows-what – from messing up systems.

So on the one hand, I get what I want. Things tend to work on Arch, and updating is only a `pacman -Syu` away. On the other hand — yeah, I’m dumb.

Posted in Apple, Rails, Ruby | Tagged , , , | 3 Comments

The Tweeting Narcissist

I’ve been playing a bit with the Sinatra web framework, and after some intermittent coding, I ended up with a toy project I’m calling the Narcissist Quotient. It may seem that I’m poking fun at of Twitter’s ego-centric bent, and perhaps this is true. It is equally possible, however, that my design is to satirize those who boisterously lament “the rampant self-absorption” instilled through today’s technologies. I’ll leave the verdict up to the reader.

As always, the code open source and available at github.

Posted in Computer Science, Ruby | Tagged , , , | 1 Comment

Clojure :pre and :post

Courtesy of Hacker News, this morning I stumbled upon a blog post mentioning :pre and :post assertions, a new feature in version 1.1 of Clojure. Given the rather messy nature of several functions in Gajure (my toy genetic algorithm framework), it seemed to me that I had an ideal opportunity to make use of this new functionality.

For instance, although the function run-ga only takes two parameters, both of them are hashes. Naturally, each must contain a few keys for the genetic algorithm to run properly. Using preconditions, it’s easy to ensure that the function’s parameters contain these required keys. For instance:

(defn keys-not-nil [lst hash]
  (reduce #(and %1 %2) (map #(not (nil? (hash %))) lst)))

This checks each key in a list of keys, and returns false if one or more of these keys maps to a value of nil. Obviously, I could go further here, but checking for non-nil values seemed a reasonable (and easy) generalization (e.g. if :init-fn or :mut-r map to nil, then the ga-run function will have a problem). Now, making use of the :pre tag.

(defn run-ga
[func-map setting-map]
{:pre [(and
  (keys-not-nil
    (list :init-fn :fit-fn :mut-fn :sel-fn :cross-fn)
    func-map)
  (keys-not-nil
    (list :pop-sz :gen :children :mut-r)
    setting-map))]}
... actual algorithm ...)

So I can now enforce the (arbitrary) required keys, and return assertion errors on some improper uses of ga-run. For a more specific example, look to the algorithm’s mutation function.

(defn generic-mutation
  "Randomly mutates lists with elements from other lists in the population."
  [list prob]
  {:pre [(and (>= prob 1) (<= prob 100))]
   :post [(list? %)]}
  (map
   (fn [s-list]
     (map
      (fn [test]
        (if (> prob (rand-int 100))
          (let [r-s (rand-int (count list))
                r-t (rand-int (count (nth list r-s)))]
            (nth (nth list r-s)
                 r-t))
          test))
      s-list))
   list))

Here, the :pre tag ensures that the value of prob is between one and one hundred. Similarly, :post asserts that the function returns a list (although I haven’t mentioned it, the :post tag operates much like :pre, using a function’s return value within its assertion). So in spite of my trivial examples, perhaps you can begin to appreciate how these tags can be used to create safer and more predictable code.

Posted in Clojure | Tagged , , , | 1 Comment

Slowly Programming in R

Recently, I coded up a cross validation function in R, and things were moving rather less quickly than I would have liked. (The purpose of c.v. is to assess how well one’s statistical analysis will generalize to an independent data set.)  Anyhow, I was implementing 10-fold cross validation, and with a dataset containing around 100,000 observations, my code was taking hours to run. This was, of course, ridiculous.

Now, I doubt that it will come as a surprise, but I am rather a newbie at this whole R thing, and as I later found out, loops in R should be avoided at all costs. After hacking around with my code, I found that its critical path looked something like this:

total <- 0
for(i in 1:nrow(dataset)){
total <- total + sum( dataset[i,1:25]*coef )
}

Now this is very simple loop, and it seemed to me somewhat less than obvious that it would beget a significant performance bottleneck. Ever so naturally, then, it did.

Ironically, the solution here is to use code more along the lines of the map-reduce paradigm, something I would have loved to do in the first place, were not I overcome by the cryptic nature of R’s documentation. After all, my favorite languages are all variants of lisp, and I am no stranger to functional programming. After some digging, I stumbled across apply, which more-or-less functions along the lines of map in scheme or clojure. So I tried:

my_sum <- function(x){ sum( x[1:25]*coef ) }
sum( apply( dataset, my_sum ) )

In addition to being more elegant, this is much, much faster. What was taking hours, now takes tens of seconds. Apparently, R has a fast backend implementation for this sort of thing.  So, this post is dedicated to as a warning to my fellow inexperienced users: avoid iterative loops in R!

Posted in Computer Science, R Programming Language | Leave a comment

National Novel Writing Month

In circumstances rather less than coincidental, a conspicuous absence of November posts coincided with my participation in NaNoWriMo (national novel writing month). Thus, throughout last month, I had a great — if difficult — time writing what might be described as a shallow antithesis to the next Great American Novel. Ever so sadly, fifty thousand words was not enough to contain the deep complexities of my story, but as such trivialities have no bearing on whether one actually wins the challenge, all ended reasonably well.

As this blog serves a largely technical purpose, I ought to mention my use of latex (specifically, LaTeXIT) for all of November’s writings. This decision was mostly to my benefit — the pdf files produced look wonderful, and the memoir package made chapter organization easy — but it also beget a few annoyances. For instance, getting decent word count updates was rather non-trivial, and the NaNoWriMo site does not accept pdf or tex files for verification. Ultimately, however, my experience was a positive one, and I would recommend latex to future participants.

In any case, I can assure the nonexistent reader that this blog should become more active in future days and weeks.

Posted in Blogging, Writing | Leave a comment

OSX Package Management

A few days ago, I was discussing Snow Leopard with a recent OSX convert. For the record, this person remains primarily a linux user — a stalwart patron of Fedora — but he had very recently acquired a unibody Macbook. As I am similarly a user of both linux and OSX, this came as music to my ears. Curious, I asked him to name the biggest flaw in his shiny new laptop.

It did not take long to get an answer, and the answer I got left me decidedly unsurprised. Having come from the linux world, he was disappointed with the state of package management on OSX. My thoughts exactly.

Now, before every random reader writes me off as an idiot and troll, let me qualify this opinion. Yes, I have heard of Macports (so had the person with whom I was conversing). Macports, as awesome as it is, has always seemed to me a bit too much of a hack, and at its best, a less than ideal solution. I miss the feeling of complete integration I get with something like pacman and Arch. These thoughts have only gained strength since the release of Snow Leopard, where half the programs I need will not install — not simply, anyway — under the macports system. Why, when I first open the beautiful packaging of an Apple computer, can I not begin to explore all manner of open source goodness? Why is there no native management system, along the lines of apt-get, yum, or pacman?

I will readily admit that these complaints fall well within the I want a pony mentality. But that said, what difficulties (or possibly, incentives) prevent Apple from establishing an official package management system? Is it simply not worth the time required to do so? Legal issues? I am young and ignorant, so quite honestly, I’d like to hear more experienced thoughts on the matter.

Posted in Apple | Tagged , , , , | 10 Comments

For the Autodidact

I recently stumbled upon several good (and free) books. All in pdf format:

Posted in Computer Science | Tagged , , | Leave a comment

On Parallelism

Recently, I read an article describing someone’s experiments in parallel genetic programming, and so I decided to run my own. As has been mentioned, I am quite fond of functional languages; for me, easy parallelism is simply a pleasant bonus. I do think its worth noting, however, that while functional programming may be generally awesome, throwing in massive amounts of parallelism is often overkill (certainly when one runs his programs on a lowly Macbook).

Take Gajure, my unfeature-full framework for creating genetic algorithms. On a whim, I changed out most map functions with pmap, its parallel counterpart, and ran the example code included on github. This took literally two seconds — for instance, something like roulette-select was changed by two characters:

(defn roulette-select
  "Select num individuals from pop, with an individual's
   fitness porportional to selection likelihood."
  [pop fit-fn num]
  (let [pop-fits (pmap fit-fn pop)
        inc-fits (iterate (fn [[pfit idx]]
                          [(+ (nth pop-fits
                           (+ idx 1)) pfit) (+ idx 1)])
                          [(first pop-fits) 0])
        max-fitness (apply + pop-fits)
        pick-one (fn [num] (second (first (drop-while
                                         #(< (first %) num)
                                         inc-fits))))]
    (pmap (fn [x] (nth pop (pick-one (rand-int max-fitness))))
               (range num))))

On the face of things, a genetic algorithm would seem to be a perfect target for parallelism. Within a single generation, fitness computations and mutation operations can be run for each population member completely independent of its compatriots. Alas, such performance gains were not realized — not, at least, on my humble dual-core machine.

The bottom line: running time more than doubled for evolving “hello world.” What previously took 1450 ms takes 3162 ms with the new implementation. I will be gathering more representative statistics later on, but probably, the overhead involved in creating parallel mappings outweighs any benefits, given that I only have two processors (on that note, I’d be curious to know how much nicer things would run in Erlang, where the threads are more lightweight). Another contributing factor is simply the inane simplicity of the problem space; in an ideal case, each fitness evaluation would be doing a bit more work.

So in some sense, my dimwitted “hello world” example represents a worse case for the parallel implementation, particularly when run on my relatively underpowered machine. With that, I’m committing the pmap changes to github under a separate branch, here.

Posted in Clojure | Tagged , , , | 7 Comments

Switching to Wordpress

After the comment system on my former blog was decimated last week, I decided to switch over to Wordpress. Despite recent security issues, this platform is far more reliable than a blogging framework that I wrote over a few weekends in Clojure. That’s not to say I’m abandoning the project, but its commenting system needs to be improved before I put it to real world use.

Strangely, after being overwhelmed with spam (and possibly some XSS) my former blog skyrocketed in google’s search results. There’s something wrong with that…

Posted in Blogging | Tagged , , | Leave a comment

A Genetic Algorithm Framework in Clojure

I love functional languages. They are wonderfully powerful, and can easily abstract away a basic algorithm that one might apply to many kinds of problems. With this in mind, I decided to build a framework for constructing genetic algorithms in Clojure. While the definition of framework is surely debatable, what I mean by this (mostly) is that I built something I can reuse. For instance, the first thing I did with with my framework was optimize test suite subsets as part of a program repair technique — but as you will see, this is far from the only thing it can do! First, I planned out in pseudocode how I wanted my framework to run. For my readers unfamiliar with them, genetic algorithms apply an evolutionary model to complex problems, attempting to evolve solutions from what is initially a population of random states. But really, a genetic algorithm is just a process, and (at least for my purposes) it boils down to this:

(defn genetic-algorithm [some-params]
  (initialize-population)
    (loop (generations-that-remain)
      (if (no-generations-left)
         (print-stuff-and-exit)
         (pass-again-to-loop
          (mutate-population
            (crossover-population
              (select-population
               (population))))))

Since Clojure is a functional language, we can pass problem-specific functions to this process, and it will give us a result. For instance, I might want to define my genetic algorithm such that it accepts functions for initializing population members, sorting them (fitness function), crossing them over, and finally, mutating them. Of course, it should also take parameters that convey setting information, like the number of generations to run, or the mutation rate. Here is what I came up with:

defn run-ga
  [func-map setting-map]
  (let [ipop ((:init-fn func-map) (:pop-sz setting-map))]
    (loop [pop ipop
           num (:gen setting-map)]
      (if (zero? num)
        (do
          (println (first (sort-by (:fit-fn func-map) > pop ))))
            (let [total-left (- (:pop-sz setting-map)
                                      (:children setting-map))]
              (do
                (println (first (sort-by (:fit-fn func-map) > pop ))))
              (recur
               (concat
                ((:mut-fn func-map)
                 (do-crossover
                  ((:sel-fn func-map)
                   pop
                   (:fit-fn func-map)
                   (* (:children setting-map) 2))
                  (:cross-fn func-map)
                  2)
                 (:mut-r setting-map))
                ((:init-fn func-map) total-left))
               (dec num)))))))

It takes two maps as its parameters, func-map and setting-map. These serve as easy ways to pass in sets of functions and settings for a given problem. The algorithm follows the basic pseudocode layout I showed earlier, and if you want a more detailed description of its parameters, check it out at github. There is one function in run-ga — do-crossover — that is not passed as a parameter, but is rather defined elsewhere. What it does is take a list of parents, and run the problem-specific crossover function provided as a parameter to the algorithm. Its code is also on github, but I will display it here as well.

(defn do-crossover
  [p-list cross-fn num-parents]
  (map cross-fn (partition num-parents p-list)))

But now, lets move on to an example problem, and put the framework through its paces! I will pick a very easy problem — lets evolve “helloworld”. To begin, I will define some aspects of the problem space. The supposed DNA of “helloworld” is nothing but a list of characters:

(def dna (map str
              ['q 'w 'e 'r 't 'y 'u 'i 'o 'p 'a 's 'd 'f 'g 'h 'j 'k 'l
               'z 'x 'c 'v 'b 'n 'm]))

I also wrote a few helper functions for the genetic algorithm framework, rand-from-list and rand-pop. These are generally useful for creating random populations. The first will return a “random” list from a list of elements, made up of the elements in that list. The second will return a list of these random lists. Here they are:

(defn rand-from-list
  [lst num]
  (let [total-el (count lst)]
    (map (fn [x] (nth lst (rand-int total-el))) (range 0 num))))

(defn rand-pop
  [lst num num-pop]
  (map (fn [x] (rand-from-list lst num)) (range 0 num-pop)))

Now, lets apply them to dna. A function that initializes a random population of strings might look like this (we are limiting our population to strings of 10 characters):

(defn some-strings [num]
 (rand-pop dna 10 num))

This is actually not what I end up doing (in the func-map I end up using a partial) but it is effectively the same thing. On to crossover and mutation:

(defn list-crossover
  [[s1 s2]]
  (let [point (rand-int
               (min (count s1)
                    (count s2)))]
    (concat (take point s1)
            (drop point s2))))

(defn generic-mutation
  [list prob]
  (map
   (fn [s-list]
     (map
      (fn [test]
        (if (> prob (rand-int 100))
          (let [r-s (rand-int (count list))
                r-t (rand-int (count (nth list r-s)))]
            (nth (nth list r-s)
                 r-t))
          test))
      s-list))
   list))

These are defined for you by the framework (optionally, if you want to use them). The function list-crossover chooses a cross-point, x, on two lists, and creates a new list made up of the first x elements of one list, and the last (length – x) elements of the other. Mutation was hard to make generic (and so it is very much imperfect), but it uses the base elements of other population members as an approximation for all the “genetic material” that can be swapped in or out of a mutated member. These functions work on most kinds of list populations. In many cases, however, you may need more complex behavior, so they are not hard-coded into the algorithm itself. Although nearly done, we still lack what is perhaps the most important part, the fitness function. This can be defined in many ways, but I use a system of one point for every directly matching character in the string. Like follows:

(defn hello-fitness
     [lst]
     (reduce
      +
      (map #(if (= %1 %2) 1 0)
           lst
           '("h" "e" "l" "l" "o" "w" "o" "r" "l" "d"))))

So, now we can define the maps:

(def func-map {:fit-fn hello-fitness :mut-fn generic-mutation
               :sel-fn roulette-select :init-fn (partial rand-pop dna 10)
               :cross-fn list-crossover})

(def set-map {:pop-sz 100 :children 50 :mut-r 1 :gen 100})

Note that roulette-select is another helper created for the framework. Its a simple function, and selects population members with a likelihood proportional to the fitness of each. For the curious, here it is:

(defn roulette-select
  [pop fit-fn num]
  (let [pop-fits (map fit-fn pop)
        inc-fits (iterate (fn [[pfit idx]]
                             [(+ (nth pop-fits (+ idx 1)) pfit) (+ idx 1)])
                          [(first pop-fits) 0])
        max-fitness (apply + pop-fits)
        pick-one (fn [num] (second (first (drop-while #(< (first %) num)
                        inc-fits))))]
    (map (fn [x] (nth pop (pick-one (rand-int max-fitness)))) (range num))))

Now, we are finished, so run the algorithm!

(run-ga func-map set-map)

Again, all this code (and some missing detail) is available at github. Hopefully you found this useful!

Posted in Clojure | Tagged , , , , | 2 Comments