leonardo - More on making Python math faster
View:Recent Entries.
View:Archive.
View:Friends.
View:Profile.
View:Website (My Website).

Tags:, , , , ,
Security:
Subject:More on making Python math faster
Time:06:04 pm
I have written more versions of the benchmark from this page:

"Making Python math 196* faster with shedskin":
http://ianozsvald.com/2008/11/17/making-python-math-196-faster-with-shedskin/

The zip with all the code:
http://www.fantascienza.net/leonardo/js/bpnn.zip
Timings smaller dataset, best of 3:
  C:                  0.15 s
  SS:                 0.29 s
  D to C:             0.33 s
  D:                  0.41 s
  SS:                 0.92 s  (with wrap around)
  Py 2.6 + Psyco:     4.84 s
  Py 2.6:            16.10 s

Timings larger dataset, best of 3:
  C:                 27.5 s
  SS:                43.4 s
  D to C:            76.9 s
  D:                 94   s
  SS:               199   s (with wrap around)
  Py 2.6 + Psyco:   untested
  Py 2.6:           untested
My CPU is Core2 at 2 GHz, WinXP 32 bit.

Compilers used:
- G++ MinGW based on GCC V.4.2.1.
- DMD V.1.036.
- Python V.2.6.0.
- Psyco V.1.6.0 final.
- ShedSkin V.0.0.29.

Compilation arguments:
- ShedSkin: -w, plus arguments in the make file: -O3 -s
- DMD: -O -release -inline
- GCC:
gcc -Wall -O3 -s -fomit-frame-pointer -fprofile-generate bpnn_c.c -o bpnn_c
Followed by a run, followed by:
gcc -Wall -O3 -s -fomit-frame-pointer -fprofile-use bpnn_c.c -o bpnn_c

Notes:
- As you can see C is faster than SS this time.
- I think in this program it's much better to load data from disk, instead of loading it as a module/code. It will also speed up the compilation for the ShedSkin code a lot (but ShedSkin may enjoy it optimizing the code more, even if the compilation times become large).
- In this program after many tests I think that SS (C++) is faster than D because the DMD compiler doesn't optimize the loops with floating point operations inside as well as GCC.
- This seems a numerically unstable program, so even if SS exchanges the oder of operands in a floating point sum, the final results become different.
- This time to create the C version I have sweat, despite it's a quite small program. And in the end I am not sure this C code is bug-free, while I think the D code doesn't have more bugs than the original Python version.
- At first sight D and C look similar, but they are really different languages. D has several features added/changed that make it qualitatively different.
- I have translated the Python code to D (using my D libraries) in just few minutes, something like 15-20 minutes, and the translation was mostly painless and sometimes almost mechanical. I have translated the D code to C in many hours. Translating Python => C may require something like 20-30 times the time you need to translate Python => D + my libs. And this despite I have used a rigorous enough method to perform the translation, and despite at the end I am not sure the C code is bug-free. This is an enormous difference.
- To translate code I first write some tests, possibly automatic, then I slowly change the original source code (Python, D, C) to look more and more as the destination language (D, C, asm), running the test after each small change to see if I have introduced bugs. At the end I translate the original code in the most mechanical way to the target language. The problem with this approach is that the bpnn code prints floating point, so creating the test is a little more complex. It uses floating point numbers, so if the code is numerically unstable it's very easy to see changes even if no true bugs are introduced. And finally the Python and C and D code use different random generators, so the results of the code are different even if the translation has not added bugs to the code.

---------------

Update Jul 17 2009: I have found a way to compile with LDC on Linux (Pubuntu) with Link-Time Optimization, and to internalize the main correctly.
Timings larger dataset, best of 3:
  D to C:            26.09 s

The compilation becomes a bit more complex:
ldc -O5 -release -inline -output-bc bpnn_d_to_c.d

opt -std-compile-opts bpnn_d_to_c.bc > bpnn_d_to_c_opt.bc

llvm-ld -native -ltango-base-ldc -ltango-user-ldc -ldl -lm -lpthread -internalize-public-api-list=_Dmain -o=bpnn_d_to_c bpnn_d_to_c_opt.bc
comments: Leave a comment Previous Entry Share Next Entry

leonardo - More on making Python math faster
View:Recent Entries.
View:Archive.
View:Friends.
View:Profile.
View:Website (My Website).