Slowly Programming in R

Recently, I coded up a cross validation function in R, and things were moving rather less quickly than I would have liked. (The purpose of c.v. is to assess how well one’s statistical analysis will generalize to an independent data set.)  Anyhow, I was implementing 10-fold cross validation, and with a dataset containing around 100,000 observations, my code was taking hours to run. This was, of course, ridiculous.

Now, I doubt that it will come as a surprise, but I am rather a newbie at this whole R thing, and as I later found out, loops in R should be avoided at all costs. After hacking around with my code, I found that its critical path looked something like this:

total <- 0
for(i in 1:nrow(dataset)){
total <- total + sum( dataset[i,1:25]*coef )
}

Now this is very simple loop, and it seemed to me somewhat less than obvious that it would beget a significant performance bottleneck. Ever so naturally, then, it did.

Ironically, the solution here is to use code more along the lines of the map-reduce paradigm, something I would have loved to do in the first place, were not I overcome by the cryptic nature of R’s documentation. After all, my favorite languages are all variants of lisp, and I am no stranger to functional programming. After some digging, I stumbled across apply, which more-or-less functions along the lines of map in scheme or clojure. So I tried:

my_sum <- function(x){ sum( x[1:25]*coef ) }
sum( apply( dataset, my_sum ) )

In addition to being more elegant, this is much, much faster. What was taking hours, now takes tens of seconds. Apparently, R has a fast backend implementation for this sort of thing.  So, this post is dedicated to as a warning to my fellow inexperienced users: avoid iterative loops in R!

This entry was posted in Computer Science, R Programming Language. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>