<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Ethan Fast &#187; R Programming Language</title>
	<atom:link href="http://blog.ethanjfast.com/category/r-programming-language/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.ethanjfast.com</link>
	<description>Lambdas, Hacks, and Fiction</description>
	<lastBuildDate>Fri, 27 Aug 2010 12:50:02 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title>Slowly Programming in R</title>
		<link>http://blog.ethanjfast.com/2009/12/slowly-programming-in-r/</link>
		<comments>http://blog.ethanjfast.com/2009/12/slowly-programming-in-r/#comments</comments>
		<pubDate>Sat, 12 Dec 2009 13:43:23 +0000</pubDate>
		<dc:creator>Ethan</dc:creator>
				<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[R Programming Language]]></category>

		<guid isPermaLink="false">http://blog.ethanjfast.com/?p=231</guid>
		<description><![CDATA[Recently, I coded up a cross validation function in R, and things were moving rather less quickly than I would have liked. (The purpose of c.v. is to assess how well one&#8217;s statistical analysis will generalize to an independent data set.)  Anyhow, I was implementing 10-fold cross validation, and with a dataset containing around 100,000 observations, my [...]]]></description>
			<content:encoded><![CDATA[<p>Recently, I coded up a <a href="http://en.wikipedia.org/wiki/Cross-validation_(statistics)">cross validation</a> function in R, and things were moving rather less quickly than I would have liked. (The purpose of c.v. is to assess how well one&#8217;s statistical analysis will generalize to an independent data set.)  Anyhow, I was implementing 10-fold cross validation, and with a dataset containing around 100,000 observations, my code was taking hours to run. This was, of course, ridiculous.</p>
<p>Now, I doubt that it will come as a surprise, but I am rather a newbie at this whole R thing, and as I later found out, loops in R should be avoided at all costs. After hacking around with my code, I found that its critical path looked something like this:<br />
<code><br />
total &lt;- 0<br />
for(i in 1:nrow(dataset)){<br />
total &lt;- total + sum( dataset[i,1:25]*coef )<br />
}<br />
</code><br />
Now this is very simple loop, and it seemed to me somewhat less than obvious that it would beget a significant performance bottleneck. Ever so naturally, then, it did.</p>
<p>Ironically, the solution here is to use code more along the lines of the map-reduce paradigm, something I would have loved to do in the first place, were not I overcome by the cryptic nature of R&#8217;s documentation. After all, my favorite languages are all variants of lisp, and I am no stranger to functional programming. After some digging, I stumbled across <em>apply</em>, which more-or-less functions along the lines of <em>map</em> in scheme or clojure. So I tried:<br />
<code><br />
my_sum &lt;- function(x){ sum( x[1:25]*coef ) }<br />
sum( apply( dataset, my_sum ) )<br />
</code><br />
In addition to being more elegant, this is much, much faster. What was taking hours, now takes tens of seconds. Apparently, R has a fast backend implementation for this sort of thing.  So, this post is dedicated to as a warning to my fellow inexperienced users: avoid iterative loops in R!</p>
 <img src="http://blog.ethanjfast.com/wp-content/plugins/feed-statistics.php?view=1&post_id=231" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://blog.ethanjfast.com/2009/12/slowly-programming-in-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
