leonardo - More Performance Python
View:Recent Entries.
View:Archive.
View:Friends.
View:Profile.
View:Website (My Website).

Tags:, , , , ,
Security:
Subject:More Performance Python
Time:01:37 pm
I have written more version of the benchmark from this page:

PerformancePython: A comparison of weave with NumPy, Pyrex, Psyco, Fortran and C++ for solving Laplace's equation:
http://scipy.org/PerformancePython

The zip with all the code:
http://www.fantascienza.net/leonardo/js/laplace.zip

Contents of the zip:
- perfpy: original code from the scipy site, plus copy of the Html page.
- laplace_ss1.py: plain Python version, fit for ShedSkin.
- laplace_ss2.py: refined version for ShedSkin, moved sublists out of the inner loop.
- laplace_cpp.cpp: C++ version from the scipy site. This version is alerady the best for the C++ compiler. Lower level optimizations make the code slower.
- laplace_d1.d: direct translation of the C++ to D 1 language. The D (DMD) compiler isn't as good as the C++ compiler, and the code is slower.
- laplace_d2.d: refined D 1 version, the inner loop is written in a lower level style to help the D compiler.
- laplace_psyco1.py: version fit for Psyco, like the the laplace_ss2.py code.
- laplace_psyco2.py: refined version for Psyco, used a 1D array.array("d"), plus manual management of its 2D nature.
Timings nx=ny=500, n_iter=1000, eps=1.0e-16, warm timings, best of 3:
  Python D:  459    s (163    X) estimated (code of Psyco 2)
  Python A:  289    s (102    X) estimated (code of SS 1)
  Python C:  210    s ( 74.7  X) estimated (code of Psyco 1)
  Python B:  208    s ( 74.0  X) estimated (code of SS 2)  
  Psyco 1:    41.5  s ( 14.8  X)
  Psyco 2:    37.47 s ( 13.3  X)
  D 1:         3.95 s (  1.40 X)
  D 2:         3.13 s (  1.11 X)
  ShedSkin 1:  2.92 s (  1.04 X)
  ShedSkin 2:  2.89 s (  1.03 X)
  C++:         2.81 s (  1    X)
The second version for D is fast enough, the first version for SS is very fast and easy to write, it's the same as the original Python version.

My CPU is Core2 at 2 GHz, WinXP 32 bit.

Compilers used:
- G++ MinGW based on GCC V.4.2.1.
- DMD V.1.036.
- Python V.2.6.0.
- Psyco V.1.6.0 final.
- ShedSkin V.0.0.29.

Compilation arguments:
- ShedSkin: -w, plus arguments in the make: -O3 -s -fomit-frame-pointer
- G++: -O3 -s -fomit-frame-pointer
- DMD: -O -release -inline

------------------------

Asm of the inner loop of LaplaceSolver.timeStep() generated by MinGW:
	.p2align 4,,7
L38:
	fldl	-8(%edx,%eax,8)
	fldl	-8(%ebx,%eax,8)
	faddl	-8(%ecx,%eax,8)
	fmul	%st(4), %st
	fldl	-16(%edx,%eax,8)
	faddl	(%edx,%eax,8)
	fmul	%st(6), %st
	faddp	%st, %st(1)
	fmul	%st(3), %st
	fstl	-8(%edx,%eax,8)
	addl	$1, %eax
	fsubp	%st, %st(1)
	cmpl	%esi, %eax
	fmul	%st(0), %st
	faddp	%st, %st(1)
	jne	L38
Asm of the inner loop of LaplaceSolver.timeStep() generated by DMD:
LA9:		fld	qword ptr [ESI]
		fstp	qword ptr 038h[ESP]
		fld	qword ptr [EAX]
		fadd	qword ptr 0[EBP]
		fmul	qword ptr 028h[ESP]
		fld	qword ptr [EBX]
		fadd	qword ptr [EDI]
		fmul	qword ptr 020h[ESP]
		faddp	ST(1),ST
		fmul	qword ptr 030h[ESP]
		fstp	qword ptr [ESI]
		fld	qword ptr [ESI]
		fsub	qword ptr 038h[ESP]
		fstp	qword ptr 040h[ESP]
		fld	qword ptr 040h[ESP]
		fmul	ST,ST(0)
		fadd	qword ptr 048h[ESP]
		fstp	qword ptr 048h[ESP]
		add	ESI,8
		add	EBP,8
		add	EAX,8
		add	EDI,8
		add	EBX,8
		inc	EDX
		cmp	010h[ESP],EDX
		jg	LA9
comments: Leave a comment Previous Entry Share Next Entry

(Anonymous)
Subject:Thanks for doing shedskin!
Link:(Link)
Time:2009-10-30 09:01 pm (UTC)
I was just getting ready to do it myself when I came across your page. I'd like to find/think of some more scientifically oriented examples to benchmark. Perhaps I'll take some from Numerical Recipes, or Langtangen's book.

Great job, Ariel
(Reply) (Thread)

leonardo - More Performance Python
View:Recent Entries.
View:Archive.
View:Friends.
View:Profile.
View:Website (My Website).