<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: On Parallelism</title>
	<atom:link href="http://blog.ethanjfast.com/2009/10/on-parallelism/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.ethanjfast.com/2009/10/on-parallelism/</link>
	<description>Lambdas, Hacks, and Fiction</description>
	<lastBuildDate>Tue, 09 Mar 2010 07:28:58 +0000</lastBuildDate>
	
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Jed Wesley-Smith</title>
		<link>http://blog.ethanjfast.com/2009/10/on-parallelism/comment-page-1/#comment-9</link>
		<dc:creator>Jed Wesley-Smith</dc:creator>
		<pubDate>Thu, 08 Oct 2009 00:38:51 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ethanjfast.com/?p=70#comment-9</guid>
		<description>I&#039;ve been playing around with some parallel list comprehension strategies for a bit, and there are a bunch of reasons why it can be a difficult problem. The atomicity chosen for parallel execution is a fairly critical issue – too small and the overhead of the decomposition and scheduling heavily outweighs the benefits, too big and you don&#039;t get much benefit from the parallelism.

Additionally, choosing data-structures and algorithms that are parallel friendly is important, Guy Steele&#039;s &lt;a href=&quot;http://vimeo.com/6624203&quot; title=&quot;foldr considered (slightly) harmful&quot; rel=&quot;nofollow&quot;&gt;excellent talk&lt;/a&gt; on this gives some perspective on how far we need to go.</description>
		<content:encoded><![CDATA[<p>I&#8217;ve been playing around with some parallel list comprehension strategies for a bit, and there are a bunch of reasons why it can be a difficult problem. The atomicity chosen for parallel execution is a fairly critical issue – too small and the overhead of the decomposition and scheduling heavily outweighs the benefits, too big and you don&#8217;t get much benefit from the parallelism.</p>
<p>Additionally, choosing data-structures and algorithms that are parallel friendly is important, Guy Steele&#8217;s <a href="http://vimeo.com/6624203" title="foldr considered (slightly) harmful" rel="nofollow">excellent talk</a> on this gives some perspective on how far we need to go.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Zak</title>
		<link>http://blog.ethanjfast.com/2009/10/on-parallelism/comment-page-1/#comment-8</link>
		<dc:creator>Zak</dc:creator>
		<pubDate>Wed, 07 Oct 2009 17:43:59 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ethanjfast.com/?p=70#comment-8</guid>
		<description>I don&#039;t think threads are the big source of your overhead here. Using pmap sets up a sequence of futures and deferences them n at a time, where n is the number of processors plus two. I&#039;m pretty sure creating the futures is what slows you down here, not spawning three threads.

According to its docstring, pmap is for a compute-intensive function and a relatively small sequence. I&#039;ve used it on a dual-core machine for running filters on a folder full of images and got about a 90% speedup.</description>
		<content:encoded><![CDATA[<p>I don&#8217;t think threads are the big source of your overhead here. Using pmap sets up a sequence of futures and deferences them n at a time, where n is the number of processors plus two. I&#8217;m pretty sure creating the futures is what slows you down here, not spawning three threads.</p>
<p>According to its docstring, pmap is for a compute-intensive function and a relatively small sequence. I&#8217;ve used it on a dual-core machine for running filters on a folder full of images and got about a 90% speedup.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ethan</title>
		<link>http://blog.ethanjfast.com/2009/10/on-parallelism/comment-page-1/#comment-7</link>
		<dc:creator>Ethan</dc:creator>
		<pubDate>Wed, 07 Oct 2009 16:57:42 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ethanjfast.com/?p=70#comment-7</guid>
		<description>@Everyone,

Thanks for the comments!

@Jeff specifically,

Yes, I am using it &quot;wrongly&quot; -- although I would prefer to look at it as inefficiently. I&#039;m fairly certain that with some recoding, a parallel version of the framework could do much better. Naturally, that&#039;s a bit harder than the naive &quot;switch out map with pmap&quot; approach... I simply wanted to see how well it would handle concurrency given minimal changes to the existing code structure.</description>
		<content:encoded><![CDATA[<p>@Everyone,</p>
<p>Thanks for the comments!</p>
<p>@Jeff specifically,</p>
<p>Yes, I am using it &#8220;wrongly&#8221; &#8212; although I would prefer to look at it as inefficiently. I&#8217;m fairly certain that with some recoding, a parallel version of the framework could do much better. Naturally, that&#8217;s a bit harder than the naive &#8220;switch out map with pmap&#8221; approach&#8230; I simply wanted to see how well it would handle concurrency given minimal changes to the existing code structure.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Zach Dwiel</title>
		<link>http://blog.ethanjfast.com/2009/10/on-parallelism/comment-page-1/#comment-6</link>
		<dc:creator>Zach Dwiel</dc:creator>
		<pubDate>Wed, 07 Oct 2009 15:43:44 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ethanjfast.com/?p=70#comment-6</guid>
		<description>I&#039;d say you might try lowering the number of threads you are starting so that it is more similar to the number of cores you are running them on.  Using to few or too many threads can both easily result in worse performance than a single thread.  Your experiment shows that you can&#039;t blindly use parallelism for gain, not that you can&#039;t use parallelism for gain.

If you search &quot;clojure pmap&quot; you&#039;ll find people in similar situations as yourself.</description>
		<content:encoded><![CDATA[<p>I&#8217;d say you might try lowering the number of threads you are starting so that it is more similar to the number of cores you are running them on.  Using to few or too many threads can both easily result in worse performance than a single thread.  Your experiment shows that you can&#8217;t blindly use parallelism for gain, not that you can&#8217;t use parallelism for gain.</p>
<p>If you search &#8220;clojure pmap&#8221; you&#8217;ll find people in similar situations as yourself.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tweets that mention On Parallelism &#124; Ethan Fast -- Topsy.com</title>
		<link>http://blog.ethanjfast.com/2009/10/on-parallelism/comment-page-1/#comment-5</link>
		<dc:creator>Tweets that mention On Parallelism &#124; Ethan Fast -- Topsy.com</dc:creator>
		<pubDate>Wed, 07 Oct 2009 15:02:06 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ethanjfast.com/?p=70#comment-5</guid>
		<description>[...] This post was mentioned on Twitter by :=a name. :=a name said: Clojure, Parallelism, and Genetic Algorithms http://bit.ly/4Gs4RV [...]</description>
		<content:encoded><![CDATA[<p>[...] This post was mentioned on Twitter by :=a name. :=a name said: Clojure, Parallelism, and Genetic Algorithms <a href="http://bit.ly/4Gs4RV" rel="nofollow">http://bit.ly/4Gs4RV</a> [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jan Rychter</title>
		<link>http://blog.ethanjfast.com/2009/10/on-parallelism/comment-page-1/#comment-4</link>
		<dc:creator>Jan Rychter</dc:creator>
		<pubDate>Wed, 07 Oct 2009 14:59:44 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ethanjfast.com/?p=70#comment-4</guid>
		<description>I don&#039;t know how much work the pick-one function really does -- from my experience, it needs to be significant before pmap makes sense. Even then, your gains on two cores might be small: by default java will run your GC in a separate thread, so your Clojure program will get &gt;100% CPU utilization (I&#039;ve seen numbers like 137%). So there isn&#039;t that much to be gained. That said, I have never seen a *loss* in performance yet.

I&#039;m glad you are measuring wall time. Too many people forget that this is the real metric.</description>
		<content:encoded><![CDATA[<p>I don&#8217;t know how much work the pick-one function really does &#8212; from my experience, it needs to be significant before pmap makes sense. Even then, your gains on two cores might be small: by default java will run your GC in a separate thread, so your Clojure program will get &gt;100% CPU utilization (I&#8217;ve seen numbers like 137%). So there isn&#8217;t that much to be gained. That said, I have never seen a *loss* in performance yet.</p>
<p>I&#8217;m glad you are measuring wall time. Too many people forget that this is the real metric.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeff</title>
		<link>http://blog.ethanjfast.com/2009/10/on-parallelism/comment-page-1/#comment-3</link>
		<dc:creator>jeff</dc:creator>
		<pubDate>Wed, 07 Oct 2009 14:37:46 +0000</pubDate>
		<guid isPermaLink="false">http://blog.ethanjfast.com/?p=70#comment-3</guid>
		<description>Isn&#039;t this because you&#039;re using pmap wrongly?   From the documentation:

&lt;blockquote&gt;
Like map, except f is applied in parallel. Semi-lazy in that the parallel computation stays ahead of the consumption, but doesn&#039;t realize the entire result unless required. Only useful for computationally intensive functions where the time of f dominates the coordination overhead.
&lt;/blockquote&gt;

I used pmap for raytracing and used partition to group the map into larger work units that are sufficiently computationally intensive to make the effort of using a thread worthwhile.  Is is possible to try that for your example?</description>
		<content:encoded><![CDATA[<p>Isn&#8217;t this because you&#8217;re using pmap wrongly?   From the documentation:</p>
<blockquote><p>
Like map, except f is applied in parallel. Semi-lazy in that the parallel computation stays ahead of the consumption, but doesn&#8217;t realize the entire result unless required. Only useful for computationally intensive functions where the time of f dominates the coordination overhead.
</p></blockquote>
<p>I used pmap for raytracing and used partition to group the map into larger work units that are sufficiently computationally intensive to make the effort of using a thread worthwhile.  Is is possible to try that for your example?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
