| leonardo ( @ 2008-11-20 13:37:00 |
More Performance Python
I have written more version of the benchmark from this page:
PerformancePython: A comparison of weave with NumPy, Pyrex, Psyco, Fortran and C++ for solving Laplace's equation:
http://scipy.org/PerformancePython
The zip with all the code:
http://www.fantascienza.net/leonard o/js/laplace.zip
Contents of the zip:
- perfpy: original code from the scipy site, plus copy of the Html page.
- laplace_ss1.py: plain Python version, fit for ShedSkin.
- laplace_ss2.py: refined version for ShedSkin, moved sublists out of the inner loop.
- laplace_cpp.cpp: C++ version from the scipy site. This version is alerady the best for the C++ compiler. Lower level optimizations make the code slower.
- laplace_d1.d: direct translation of the C++ to D 1 language. The D (DMD) compiler isn't as good as the C++ compiler, and the code is slower.
- laplace_d2.d: refined D 1 version, the inner loop is written in a lower level style to help the D compiler.
- laplace_psyco1.py: version fit for Psyco, like the the laplace_ss2.py code.
- laplace_psyco2.py: refined version for Psyco, used a 1D array.array("d"), plus manual management of its 2D nature.
My CPU is Core2 at 2 GHz, WinXP 32 bit.
Compilers used:
- G++ MinGW based on GCC V.4.2.1.
- DMD V.1.036.
- Python V.2.6.0.
- Psyco V.1.6.0 final.
- ShedSkin V.0.0.29.
Compilation arguments:
- ShedSkin: -w, plus arguments in the make: -O3 -s -fomit-frame-pointer
- G++: -O3 -s -fomit-frame-pointer
- DMD: -O -release -inline
------------------------
Asm of the inner loop of LaplaceSolver.timeStep() generated by MinGW:
I have written more version of the benchmark from this page:
PerformancePython: A comparison of weave with NumPy, Pyrex, Psyco, Fortran and C++ for solving Laplace's equation:
http://scipy.org/PerformancePython
The zip with all the code:
http://www.fantascienza.net/leonard
Contents of the zip:
- perfpy: original code from the scipy site, plus copy of the Html page.
- laplace_ss1.py: plain Python version, fit for ShedSkin.
- laplace_ss2.py: refined version for ShedSkin, moved sublists out of the inner loop.
- laplace_cpp.cpp: C++ version from the scipy site. This version is alerady the best for the C++ compiler. Lower level optimizations make the code slower.
- laplace_d1.d: direct translation of the C++ to D 1 language. The D (DMD) compiler isn't as good as the C++ compiler, and the code is slower.
- laplace_d2.d: refined D 1 version, the inner loop is written in a lower level style to help the D compiler.
- laplace_psyco1.py: version fit for Psyco, like the the laplace_ss2.py code.
- laplace_psyco2.py: refined version for Psyco, used a 1D array.array("d"), plus manual management of its 2D nature.
Timings nx=ny=500, n_iter=1000, eps=1.0e-16, warm timings, best of 3: Python D: 459 s (163 X) estimated (code of Psyco 2) Python A: 289 s (102 X) estimated (code of SS 1) Python C: 210 s ( 74.7 X) estimated (code of Psyco 1) Python B: 208 s ( 74.0 X) estimated (code of SS 2) Psyco 1: 41.5 s ( 14.8 X) Psyco 2: 37.47 s ( 13.3 X) D 1: 3.95 s ( 1.40 X) D 2: 3.13 s ( 1.11 X) ShedSkin 1: 2.92 s ( 1.04 X) ShedSkin 2: 2.89 s ( 1.03 X) C++: 2.81 s ( 1 X)The second version for D is fast enough, the first version for SS is very fast and easy to write, it's the same as the original Python version.
My CPU is Core2 at 2 GHz, WinXP 32 bit.
Compilers used:
- G++ MinGW based on GCC V.4.2.1.
- DMD V.1.036.
- Python V.2.6.0.
- Psyco V.1.6.0 final.
- ShedSkin V.0.0.29.
Compilation arguments:
- ShedSkin: -w, plus arguments in the make: -O3 -s -fomit-frame-pointer
- G++: -O3 -s -fomit-frame-pointer
- DMD: -O -release -inline
------------------------
Asm of the inner loop of LaplaceSolver.timeStep() generated by MinGW:
.p2align 4,,7 L38: fldl -8(%edx,%eax,8) fldl -8(%ebx,%eax,8) faddl -8(%ecx,%eax,8) fmul %st(4), %st fldl -16(%edx,%eax,8) faddl (%edx,%eax,8) fmul %st(6), %st faddp %st, %st(1) fmul %st(3), %st fstl -8(%edx,%eax,8) addl $1, %eax fsubp %st, %st(1) cmpl %esi, %eax fmul %st(0), %st faddp %st, %st(1) jne L38Asm of the inner loop of LaplaceSolver.timeStep() generated by DMD:
LA9: fld qword ptr [ESI] fstp qword ptr 038h[ESP] fld qword ptr [EAX] fadd qword ptr 0[EBP] fmul qword ptr 028h[ESP] fld qword ptr [EBX] fadd qword ptr [EDI] fmul qword ptr 020h[ESP] faddp ST(1),ST fmul qword ptr 030h[ESP] fstp qword ptr [ESI] fld qword ptr [ESI] fsub qword ptr 038h[ESP] fstp qword ptr 040h[ESP] fld qword ptr 040h[ESP] fmul ST,ST(0) fadd qword ptr 048h[ESP] fstp qword ptr 048h[ESP] add ESI,8 add EBP,8 add EAX,8 add EDI,8 add EBX,8 inc EDX cmp 010h[ESP],EDX jg LA9