I’ve been spending a bit of time this weekend playing with Haskell, and I came across an interesting problem while writing a few functions that, overtly, would seem quite simple. Notably, Haskell is a lazy language, storing promises of later evaluation in what are called thunks.
Having some reasonable amount of experience in Clojure, lazy evaluation did not disconcert me. However, consider the following code:
foldl (+) 0 [0..5000000]
In a result that was initially somewhat surprising to me, running this produces a stack overflow. I later found out the reason; namely, all the thunk promises are being stored on the stack, building upon each other and taking up a great deal of space. Conceptually, it might look something like:
(...(((((0 + 1) + 2) + 3) + 4) + ...) + ... 5000000)
For something as simple as addition, it is far more efficient to force evaluation as foldl builds upon itself. In this specific case, Haskell has us covered with foldl’ which is a strict version of fold. The following code runs just fine:
foldl' (+) 0 [0..5000000]
One might think of is as executing something like this:
foldl' (+) 0 [1..5000000]
foldl' (+) 0 [1..5000000]
foldl' (+) 1 [2..5000000]
foldl' (+) 3 [3..5000000]
... (eventually)
foldl' (+) 12500002500000 []
Here, the expensive saving of thunks is unnecessary. To see how this might be possible, take a look at seq, a function which forces the evaluation of its first argument. All told, this served as a nice and innocuous reminder that I should be more careful with the implications of using a lazy language.
2 Comments
Turn optimization on, the optimizer will remove the lazy thunk creation in this case and turn it into a loop.
This page from the Haskell wiki may provide some additional insight: Foldr Foldl Foldl’
One Trackback
Social comments and analytics for this post…
This post was mentioned on Reddit by klodolph: Turn optimization on. This fixes the problem, and you don’t have to change the code….