"Making Python math 196* faster with shedskin":
The zip with all the code:
Timings smaller dataset, best of 3: C: 0.15 s SS: 0.29 s D to C: 0.33 s D: 0.41 s SS: 0.92 s (with wrap around) Py 2.6 + Psyco: 4.84 s Py 2.6: 16.10 s Timings larger dataset, best of 3: C: 27.5 s SS: 43.4 s D to C: 76.9 s D: 94 s SS: 199 s (with wrap around) Py 2.6 + Psyco: untested Py 2.6: untestedMy CPU is Core2 at 2 GHz, WinXP 32 bit.
- G++ MinGW based on GCC V.4.2.1.
- DMD V.1.036.
- Python V.2.6.0.
- Psyco V.1.6.0 final.
- ShedSkin V.0.0.29.
- ShedSkin: -w, plus arguments in the make file: -O3 -s
- DMD: -O -release -inline
gcc -Wall -O3 -s -fomit-frame-pointer -fprofile-generate bpnn_c.c -o bpnn_c
Followed by a run, followed by:
gcc -Wall -O3 -s -fomit-frame-pointer -fprofile-use bpnn_c.c -o bpnn_c
- As you can see C is faster than SS this time.
- I think in this program it's much better to load data from disk, instead of loading it as a module/code. It will also speed up the compilation for the ShedSkin code a lot (but ShedSkin may enjoy it optimizing the code more, even if the compilation times become large).
- In this program after many tests I think that SS (C++) is faster than D because the DMD compiler doesn't optimize the loops with floating point operations inside as well as GCC.
- This seems a numerically unstable program, so even if SS exchanges the oder of operands in a floating point sum, the final results become different.
- This time to create the C version I have sweat, despite it's a quite small program. And in the end I am not sure this C code is bug-free, while I think the D code doesn't have more bugs than the original Python version.
- At first sight D and C look similar, but they are really different languages. D has several features added/changed that make it qualitatively different.
- I have translated the Python code to D (using my D libraries) in just few minutes, something like 15-20 minutes, and the translation was mostly painless and sometimes almost mechanical. I have translated the D code to C in many hours. Translating Python => C may require something like 20-30 times the time you need to translate Python => D + my libs. And this despite I have used a rigorous enough method to perform the translation, and despite at the end I am not sure the C code is bug-free. This is an enormous difference.
- To translate code I first write some tests, possibly automatic, then I slowly change the original source code (Python, D, C) to look more and more as the destination language (D, C, asm), running the test after each small change to see if I have introduced bugs. At the end I translate the original code in the most mechanical way to the target language. The problem with this approach is that the bpnn code prints floating point, so creating the test is a little more complex. It uses floating point numbers, so if the code is numerically unstable it's very easy to see changes even if no true bugs are introduced. And finally the Python and C and D code use different random generators, so the results of the code are different even if the translation has not added bugs to the code.
Update Jul 17 2009: I have found a way to compile with LDC on Linux (Pubuntu) with Link-Time Optimization, and to internalize the main correctly.
Timings larger dataset, best of 3: D to C: 26.09 s
The compilation becomes a bit more complex:
ldc -O5 -release -inline -output-bc bpnn_d_to_c.d
opt -std-compile-opts bpnn_d_to_c.bc > bpnn_d_to_c_opt.bc
llvm-ld -native -ltango-base-ldc -ltango-user-ldc -ldl -lm -lpthread -internalize-public-api-list=_Dmain -o=bpnn_d_to_c bpnn_d_to_c_opt.bc