Tags
as it turns out Barefoot Running Blogging Clogger Clojure Computer Science ebook Enlive Framework Functional Programming Gajure Genetic Algorithms hacking Incanter initiative Macports Math Open source OSX Package Management paul graham police Rails Ruby running sinatra Snow Leopard Spam startups success twitter web application Wordy Writing-
Recent Posts
- Analyzing Word Frequencies with Clojure, Enlive and Incanter
- As it turns out is quite innocuous
- Gajure Now on Clojars
- Police Pursue and Capture a Barefoot Runner
- On Initiative
- How I develop on OSX
- The Tweeting Narcissist
- Clojure :pre and :post
- Slowly Programming in R
- National Novel Writing Month
- OSX Package Management
- For the Autodidact
- On Parallelism
- Switching to Wordpress
- A Genetic Algorithm Framework in Clojure
A Unix Fortune
Find one hereKnow a Tweeting Narcissist?
Take a look at this.
Slowly Programming in R
Recently, I coded up a cross validation function in R, and things were moving rather less quickly than I would have liked. (The purpose of c.v. is to assess how well one’s statistical analysis will generalize to an independent data set.) Anyhow, I was implementing 10-fold cross validation, and with a dataset containing around 100,000 observations, my code was taking hours to run. This was, of course, ridiculous.
Now, I doubt that it will come as a surprise, but I am rather a newbie at this whole R thing, and as I later found out, loops in R should be avoided at all costs. After hacking around with my code, I found that its critical path looked something like this:
total <- 0
for(i in 1:nrow(dataset)){
total <- total + sum( dataset[i,1:25]*coef )
}
Now this is very simple loop, and it seemed to me somewhat less than obvious that it would beget a significant performance bottleneck. Ever so naturally, then, it did.
Ironically, the solution here is to use code more along the lines of the map-reduce paradigm, something I would have loved to do in the first place, were not I overcome by the cryptic nature of R’s documentation. After all, my favorite languages are all variants of lisp, and I am no stranger to functional programming. After some digging, I stumbled across apply, which more-or-less functions along the lines of map in scheme or clojure. So I tried:
my_sum <- function(x){ sum( x[1:25]*coef ) }
sum( apply( dataset, my_sum ) )
In addition to being more elegant, this is much, much faster. What was taking hours, now takes tens of seconds. Apparently, R has a fast backend implementation for this sort of thing. So, this post is dedicated to as a warning to my fellow inexperienced users: avoid iterative loops in R!