## Monday, December 3, 2007

### Holy shmoly Haskell doesn't smoke Python away (that much)

Recently, there has been some talk about how Haskell smokes Python and Ruby away in the naive fibonacci algorithm. While it is obvious that Haskell will beat both due to the code being compiled and not interpreted, I decided to see if I got the same results as reported on the Haskell hacking blog, and also see how Python handles parallel processing. Note that I will refer to the Haskell hacking blog simply as Haskell blog from now on.

Since I have no experience in Ruby, I will only compare Python and Haskell. So, here is the code I used to test the two.
Python:

`from time import timedef fib(n):   if n == 0 or n == 1:      return n   else:      return fib(n-1) + fib(n-2)      if __name__ == "__main__":        start_time = time()    for i in range(36):        print "n=%d => %d" % (i, fib(i))            print "Time elapsed: %f" % (time() - start_time)`

So, as we can see, this code is pretty much the same as that reported in the blog about Haskell.

`Haskell:import Control.Monadimport Text.Printffib :: Int -> Intfib 0 = 0fib 1 = 1fib n = fib (n-1) + fib (n-2)main = forM_ [0..35] \$ \i ->    printf "n=%d => %d\n" i (fib i)`

This is also the same as in the blog, the only difference being that I import what is needed to make it run (for some reason, the import statements aren't included in the Haskell blog). Since I have no idea how to measure time in Haskell (any hint on how to do that is appreciated), I used the ptime command for Windows (available at http://www.pc-tools.net/win32/ptime/).

So, here are the results of the test:

Python 2.5: 26.07 s
Haskell (GHC 6.8.1): 3.873 s (mean of 5 runs)

So, the results for Python are pretty much the same as reported in the Haskell blog (25.16 s is reported there), but the ones for Haskell are not, in fact the Haskell blog reports 0.48 s, which is some four times faster than I got on my computer. What is the reason for this? I have no idea. I compiled the program using "ghc fib.hs", maybe they used some code optimizations? Anyway, the Haskell advantage seems to be melting, although Haskell is still some 6-7 times faster. However, the Haskell blog proceeds to use parallel processing in Haskell, but uses no such thing in Python. So, I decided to see if Python can take advantage of my two cores to shake some time off this 26 seconds. I used the Parallel Python package (available at http://www.parallelpython.com/) to make use of my two cores. Here is the modified Python program:

`from time import timeimport ppdef fib(n):   if n == 0 or n == 1:      return n   else:      return fib(n-1) + fib(n-2)if __name__ == "__main__":    start_time = time()    ppservers = ()    job_server = pp.Server(ppservers=ppservers)    print "Starting pp with", job_server.get_ncpus(), "workers"    jobs = [(input, job_server.submit(fib, (input,), (), ())) for input in range(36)]    for input, job in jobs:        print "n=%d => %d" % (input, job())            print "Time elapsed: %f" % (time() - start_time)`

The result: 17.86 s
So, Python went from 26.07s to 17.86s, which means it ran 31.5% faster than on only one core. Unfortunately, I had no success in running the parallel Haskell code reported on the Haskell blog (for some reason, I always got "`pseq` not in scope" error while compiling). However, I can use the numbers given on the Haskell blog, where Haskell went from 0.48s to 0.42s on two cores. This means a 12.5% reduction in execution time. Although I admit that is takes slightly more intervention from the programmer to make the parallelism in Python work, the gain in speed is also more significant. As a final note, let us compare 3.873s (Haskell) and 17.86s (parallel Python) - the ratio is around 4.6, let's say 5. So, Haskell doesn't smoke Python away THAT much.

Don Stewart said...
This comment has been removed by the author.
Don Stewart said...

That code is compiled without optimisations:

ghc -O2

Or:

ghc -O2 -optc-O

The followup post explored the use of `par` and `pseq` some more, and corrected
a bug in the original parallelisation strategy.

Don Stewart said...

And you parallelise the code differently, I think? (Spawning one thread for each top level fib call?).

Can you clarify if each recursive fib call is parallelised, or just the initial call for some value of N?

Doing just the top level parallelisation is much easier, but is a different program to the one presented. (Good to see its possible in python though!).

And regarding "pseq: not in scope" - you have to have ghc 6.8.1 to use that

Filox said...

Don:
you are right, this is just top level parallelization. It is different from the Haskell program, but my goal was just to show that some parallelization is also possible in Python... I will try to write a different program that will parallelize in the similar way to that in Haskell when I get some free time.
I used ghc 6.8.1, there is an error in the code that was later corrected. Again, when I get some time I'll run this corrected code.

Don Stewart said...

@filox

Thanks for the clarification.

The main problem to address though is that you're testing for speed, yet didn't compile the Haskell with the optimiser on.
Without optimisations, I get similar numbers to you:

\$ ghc A.hs -o A
\$ time ./A
...
n=35 => 9227465
./A 3.76s user 0.09s system 99% cpu 3.861 total

However, compiling with -O2, as all production Haskell does:

\$ ghc -O2 A.hs -o B -no-recomp
\$ ./B
...
n=35 => 9227465
./B 0.48s user 0.01s system 99% cpu 0.492 total

We get the same numbers in my original post: still a 50x speedup over serial Python, and 36x over the parallelised version.

If we then go ahead an parallelise the Haskell code at the top level -- here's one parallel top level:

main = mapM_ putStr \$
parMap rnf (\i -> printf "n=%d => %d\n" i (fib i)) [0..35]

We get good multicore speedups again:

\$ time ./A +RTS -N2
...
n=35 => 9227465
./A +RTS -N2 0.28s user 0.00s system 158% cpu 0.174 total

I'd argue that the result is much the same as I originally stated: 50x speedups over Python, much more so than the 6x you report, and remaining easier to parallelise.

stephan said...

But with python you wouldn't go for a recursive approach like in haskell thats just crazy and creates a senseless big stack.

You either do:

def fib(n):
if n < 2:
return n
else:
a = 0
b = 1
x = 0
for i in xrange(2,n):
x = a + b
a = b
b = x
return x

for i in xrange(36):
print "n=%d => %d" % (i, fib(i))

Or you would cache:

fib_dict = {}

def fib(n):
if n > 1:
try:
x = fib_dict[n]
return x
except:
x = fib(n-1) + fib(n-2)
fib_dict[n] = x
return x
else:
return n

for i in range(36):
print "n=%d => %d" % (i, fib(i))

Both versions will easily compete or even outperform the parallel Haskell version.

Filox said...

@stephan
The whole point was to compare the same program (approach) in different languages... That is why I used the naive Fibonacci instead of some iterative approach. I could have easily used the formula for the n-th Fibonacci number directly if I wanted fast code