| This is a small 3D rendering benchmark, ao bench, by Syoyo Fujita: http://lucille.atso-net.jp/aobench/
I have improved the C and Python versions, and I have created versions for ShedSkin, D, Java, etc: http://www.fantascienza.net/leonardo/js/ao_bench.zip
Notes: - This time the timings for D are good. - I have not tried the LDC D compiler, but other people may do it. - In Python I have inlined one function (vdot) manually. - ao2_py uses both the multiprocessing module of Python 2.6 and Psyco. - The ShedSkin version is slow. - Compiling ao_py with the version 0.1 of ShedSkin is a bit boring and slow. You need to use -i -w and then wait for it to perform the maximum of 30 iterations. Then for max performance you have to manually modify the CCFLAGS inside the make before compiling the C++ code. - To create the ao2_py version I have had just to turn the render function into a pure function (well, it uses some global data, but doesn't change it), and the multiprocessing is done by a single line of code: Pool().map(render, xrange(HEIGHT), chunksize=2) - I'd like to see how well the Psyco-processing version goes with four cores (if you want to run Python on a 4-core 64-bit CPU you may be tempted to use a 64-bit operating system, because sometimes 64-bit code is faster. But using 64 bits you can't use Psyco, so the end result is often a strong slowdown of your Python code. Using a 64 bit OS is good if you need a lot of RAM). - The Java version (AO.java) is a port from the original Processing version, it's quite naive (despite I have removed some useless allocation of arrays, that don't slow down code much), so it's probably easy to speed it up. - The ao2_d D version is naive and very slow, it's a very close translation of the Java version. It shows that D code written in Java-like style can be very slow when run by D. So Java programmers coming to D have to be careful. For example current D compilers aren't able to inline virtual calls as HotSpot is sometimes able to do. - I have converted the Java code into an applet (http://www.fantascienza.net/leonardo/blog_pics/ao_bench/ao_benchmark.html ), it runs about as fast as the Java code for console. On my PC the Processing version (on Syoyo's site) runs in about 8.7 seconds, while this naive pure Java applet needs 6.46 seconds. I don't know why Processing is so much slower. The code is almost the same. - The ao3_d D version is derived from the ao2_d version, but I have declared 'scope' some object creations to reduce heap allocations. The resulting code is significantly faster than ao2_d (but much slower than the optimized D code still), but you have to be careful when declaring 'scope' because it may lead to bugs if the created objects then escape the scope. - I have done few more tests. On the same PC a 128x128 benchmark with Javascript takes about 22.36 s with Firefox 3.03 and about 3 seconds with Chrome build 21 running through Wine. I have also seen that on Ubuntu 8.10, compiling with gcc V.4.3.2 the C code needs just 2.54 s (and 2.8 s using the standard drand48), this timing is similar to the C timing on Syoyo site. I don't know why MinGW on Windows gives so much slower code. I have not tested LDC on Linux.
Timings, best of 3, seconds:
ao_d, float: 3.67
ao_c with gcc-llvm, float: 3.72
ao_c with gcc-llvm, double: 3.83
ao_c with gcc, double: 3.99
ao_c with gcc, float: 4.04
ao_d, double: 4.10
AO.java, float, naive: 6.35
ao2_py with Psyco: 16.72
ao3_d, float, naive: 24.09
ao_py with Psyco: 29.75
ao2_d, float, naive: 31.62
ao_py with ShedSkin 48.58
ao2_py without Psyco: 70.6
ao_py without Psyco: 138.46
Timings on Pubuntu:
ao1_d: 2.95
ao_c, gcc, float: 3.82
ao3_d: 8.78 (BUG)
ao2_d: 16.52 (BUG)
Timings on Pubuntu with LTO+internalizing:
ao1_d: 2.87
Parameters used:
WIDTH = HEIGHT = 256
NSUBSAMPLES = 2
NAO_SAMPLES = 8
--------------
CPU: Intel Core 2, 2 GHz (2 cores)
D code compiled with:
DMD v1.041
-O -release -inline
C code compiled with:
gcc: V. 4.3.3-dw2-tdm-1 (GCC)
LLVM: gcc version 4.2.1 (Based on Apple Inc. build 5636)
For both:
-Wall -O3 -s -fomit-frame-pointer -msse3 -march=core2
Python:
ActivePython 2.6.1.1
(r261:67515, Dec 5 2008, 13:58:38)
[MSC v.1500 32 bit (Intel)] on win32
Psyco for Python 2.6, V.1.6.0 final 0
ShedSkin V. 0.1
ss -i -w ao_ss.py
using gcc version 4.3.3-dw2-tdm-1 (GCC)
CCFLAGS=-O3 -s -fomit-frame-pointer -msse2 ...
Javac 1.6.0_12
Java SE runtime build 1.6.0_07-b06
To compile it this way (LTO+internalizing) you need three more complex commands: ldc -O5 -release -inline -output-bc ao1_d.d
opt -std-compile-opts ao1_d.bc > ao1_d_opt.bc
llvm-ld -native -ltango-base-ldc -ltango-user-ldc -ldl -lm -lpthread -internalize-public-api-list=_Dmain -o=ao1_d ao1_d_opt.bc ---------------------
Update 1, Mar 26 2009: removed the ao_ss version because I've found that ShedSkin is able to slowly compile ao_py too after all. Added the ao2_py version that uses the multiprocessing module. A D version that uses threads can be created.
Update 2, Mar 27 2009: added a naive Java version (AO.java) adapted from the original Processing version, plus a D translation (ao2_d.d) of this Java version to show how Java-style code can be slow in D.
Update 3, Mar 27 2009: improved a bit the Java version. Added a Java applet version of the Java code.
Update 4, Mar 28 2009: little changes in the ao2_d version, and added ao3_d version, plus few timings on Linux.
Update 5, Jun 21 2009: fixed ao1_d for Tango too. On Tango rand() returns a number in (0..maxint). ao2_d and ao3_d have a similar problem.
Update 6, Jul 17 2009: I have found a way to use Link-Time Optimization with LDC, and to perform a correct internalize of the main. | comments: Leave a comment  |
| This interesting blog page "Refactoring Methods with Recursive Combinators" shows Ruby implementations of a recursive combinator: http://github.com/raganwald/homoiconic/tree/master/2008-11-23%2Frecursive_combinators.md
I like the idea of putting a blog in a version control, I just hope the usability will improve a little more for blog readers (removing some noise from the page).
Here you can find my Python and D versions: http://www.fantascienza.net/leonardo/js/multirec.zip
The translation to Python was painless and not too much difficult, so Python is up to the task:
def divide_and_conquer(value, isdivisible, conquer, divide, recombine):
if isdivisible(value):
return recombine(divide_and_conquer(sub_value, isdivisible, conquer, divide, recombine)
for sub_value in divide(value))
else:
return conquer(value)
identity = lambda x: x
def rotate2(square):
def divide(square):
half_len = len(square) // 2
def sub_square(nrow, ncol):
return [row[ncol: ncol + half_len] for row in square[nrow: nrow + half_len]]
upper_left = sub_square(0, 0)
lower_left = sub_square(half_len, 0)
upper_right = sub_square(0, half_len)
lower_right = sub_square(half_len, half_len)
return [upper_left, lower_left, upper_right, lower_right]
def recombine((upper_left, lower_left, upper_right, lower_right)):
return [l + r for l,r in zip(upper_right, lower_right)] + \
[l + r for l,r in zip(upper_left, lower_left)]
return divide_and_conquer(square,
isdivisible=lambda value: len(value) > 1,
conquer= identity,
divide= divide,
recombine= recombine)
I think this Python code is also a little more readable than the original Ruby one (and probably faster too).
This code is a cute exercise, but it's less readable than the original version (that doesn't use a combinator), longer, more difficult to debug and modify, and slower. And using multiple threads doesn't help much in normal Python code.
Then I have tried to translate the code to the D language, that is a statically typed language that is more flexible than the usual C++ or C, but not as flexible for example as Scala or Haskell.
First of all I have translated the original functions (function templates), it's doable, and the result looks even nice enough, especially if you use my dlibs:
BaseType!(TyIter) sum_squares3(TyIter)(TyIter value) {
static if (IsIterable!(TyIter)) {
BaseType1!(TyIter) x;
return sum(select(sum_squares3(x), x, value));
} else
return value * value;
}
T[][] rotate1(T)(T[][] square) {
if (len(square) > 1) {
int half_len = len(square) / 2;
T[][] sub_square(int nrow, int ncol) {
T[] row;
return select(row[ncol.. ncol + half_len], row, square[nrow.. nrow + half_len]);
}
auto upper_left = rotate1(sub_square(0, 0));
auto lower_left = rotate1(sub_square(half_len, 0));
auto upper_right = rotate1(sub_square(0, half_len));
auto lower_right = rotate1(sub_square(half_len, half_len));
T[][] lr;
return select(lr[0] ~ lr[1], lr, azip(upper_right, lower_right)) ~
select(lr[0] ~ lr[1], lr, azip(upper_left, lower_left));
} else
return square;
}
Note that static if (IsIterable!(TyIter)), it's necessary because as the sum_squares3 drills down, the type of value changes. So you need a static if to manage the change in a static language like D.
Note that the D version sum_squares3() isn't as flexible as the Python version. In normal D you can't define an array with mixed elements: [iter([1, {2:0, 3:0}]), [set([4, 5]), 6], [[[7]]]] Or even an array with a mixed number of levels: [1, 2, 3, [[4, 5], 6], [[[7]]]] You are allowed to use a fixed number or levels, with subarrays of different length: [[1, 2], [3], [4, 5], [6], [7]]
Then I have tried to translate the multirec combinator itself. But I think here the static type system of D shows the limits of its flexibility. The translation was a pain. This a result:
ReturnType!({ return F2.init(Init!(T)().init); }) divide_and_conquer(T, F1, F2, F3, F4)
(T value, F1 isdivisible,
F2 conquer, F3 divide,
F4 recombine) {
if (isdivisible(value)) {
T[] aux;
foreach (sub_value; divide(value))
aux ~= divide_and_conquer(sub_value, isdivisible, conquer, divide, recombine);
return recombine(aux);
} else
return conquer(value);
}
struct _identity { static T opCall(T)(T x) { return x; } }
_identity identity;
T[][] rotate2(T)(T[][] square) {
T[][][] divide(T[][] square) {
int half_len = len(square) / 2;
T[][] sub_square(int nrow, int ncol) {
T[] row;
return select(row[ncol ..ncol + half_len], row, square[nrow .. nrow + half_len]);
}
auto upper_left = sub_square(0, 0);
auto lower_left = sub_square(half_len, 0);
auto upper_right = sub_square(0, half_len);
auto lower_right = sub_square(half_len, half_len);
return [upper_left, lower_left, upper_right, lower_right][];
}
T[][] recombine(T[][][] parts) {
auto upper_left = parts[0];
auto lower_left = parts[1];
auto upper_right = parts[2];
auto lower_right = parts[3];
T[][] lr;
return select(lr[0] ~ lr[1], lr, azip(upper_right, lower_right)) ~
select(lr[0] ~ lr[1], lr, azip(upper_left, lower_left));
}
return divide_and_conquer(square,
(T[][] square){return len(square) > 1;},
identity,
รท,
&recombine);
}
Note the complex return type, that infers the type from a little function built on the fly: ReturnType!({ return F2.init(Init!(T)().init); }) In D V.2 you can just replace that with "auto", simplifying the code, using type inference on the return type.
Init!(T)().init replaces a T.init, it's a way I have invented to work around a bug of the DMD compiler, that gives a wrong result if T is a static array.
divide_and_conquer() in D doesn't currently work for a sum_squares4() function, because it requires isdivisible to be a compile-time template (used as a function, and passed with an alias, to the outer template) to the function template divide_and_conquer(), instead of being passed as delegate (or function pointer) as now. You can call it like this:
divide_and_conquer2!(IsItetrable)(seq, sqr, identity, sumb)
Where sqr and sumb are something like:
struct _sqr { static typeof(T.init * T.init) opCall(T)(T x) { return x * x; } }
_sqr sqr;
struct _sumb { static typeof(sum(Init!(T)().init)) opCall(T)(T seq) { return sum(seq); } }
_sumb sumb;
Probably there's a way to make this work, but after some tried I have given up. I think that such working code can be written, but you end with two different versions of divide_and_conquer(), so if you want a single function you probably need a language with a more flexible type system than the current D1 one (I presume Haskell is up to the task).
Finally I have tried to adapt the Python code to ShedSkin Python, I have had to pull out nested functions and to replace a generator expression with a list comp. I think that currently you can't write sum_squares4() because there's no way to write that isiterable() function shown in the Python code:
def isiterable(value):
try:
iter(value)
except TypeError:
return False
else:
return True
(And even once writing such function becomes possible, I don't know if the type system of ShedSkin is able to compile divide_and_conquer() that becomes polimorphic). So currently ShedSkin seems quite less flexible than the D language. | comments: Leave a comment  |
| I have written more versions of the benchmark from this page:
"Making Python math 196* faster with shedskin": http://ianozsvald.com/2008/11/17/making-python-math-196-faster-with-shedskin/
The zip with all the code: http://www.fantascienza.net/leonardo/js/bpnn.zip
Timings smaller dataset, best of 3:
C: 0.15 s
SS: 0.29 s
D to C: 0.33 s
D: 0.41 s
SS: 0.92 s (with wrap around)
Py 2.6 + Psyco: 4.84 s
Py 2.6: 16.10 s
Timings larger dataset, best of 3:
C: 27.5 s
SS: 43.4 s
D to C: 76.9 s
D: 94 s
SS: 199 s (with wrap around)
Py 2.6 + Psyco: untested
Py 2.6: untested
My CPU is Core2 at 2 GHz, WinXP 32 bit.
Compilers used: - G++ MinGW based on GCC V.4.2.1. - DMD V.1.036. - Python V.2.6.0. - Psyco V.1.6.0 final. - ShedSkin V.0.0.29.
Compilation arguments: - ShedSkin: -w, plus arguments in the make file: -O3 -s - DMD: -O -release -inline - GCC: gcc -Wall -O3 -s -fomit-frame-pointer -fprofile-generate bpnn_c.c -o bpnn_c Followed by a run, followed by: gcc -Wall -O3 -s -fomit-frame-pointer -fprofile-use bpnn_c.c -o bpnn_c
Notes: - As you can see C is faster than SS this time. - I think in this program it's much better to load data from disk, instead of loading it as a module/code. It will also speed up the compilation for the ShedSkin code a lot (but ShedSkin may enjoy it optimizing the code more, even if the compilation times become large). - In this program after many tests I think that SS (C++) is faster than D because the DMD compiler doesn't optimize the loops with floating point operations inside as well as GCC. - This seems a numerically unstable program, so even if SS exchanges the oder of operands in a floating point sum, the final results become different. - This time to create the C version I have sweat, despite it's a quite small program. And in the end I am not sure this C code is bug-free, while I think the D code doesn't have more bugs than the original Python version. - At first sight D and C look similar, but they are really different languages. D has several features added/changed that make it qualitatively different. - I have translated the Python code to D (using my D libraries) in just few minutes, something like 15-20 minutes, and the translation was mostly painless and sometimes almost mechanical. I have translated the D code to C in many hours. Translating Python => C may require something like 20-30 times the time you need to translate Python => D + my libs. And this despite I have used a rigorous enough method to perform the translation, and despite at the end I am not sure the C code is bug-free. This is an enormous difference. - To translate code I first write some tests, possibly automatic, then I slowly change the original source code (Python, D, C) to look more and more as the destination language (D, C, asm), running the test after each small change to see if I have introduced bugs. At the end I translate the original code in the most mechanical way to the target language. The problem with this approach is that the bpnn code prints floating point, so creating the test is a little more complex. It uses floating point numbers, so if the code is numerically unstable it's very easy to see changes even if no true bugs are introduced. And finally the Python and C and D code use different random generators, so the results of the code are different even if the translation has not added bugs to the code.
---------------
Update Jul 17 2009: I have found a way to compile with LDC on Linux (Pubuntu) with Link-Time Optimization, and to internalize the main correctly.
Timings larger dataset, best of 3:
D to C: 26.09 s
The compilation becomes a bit more complex: ldc -O5 -release -inline -output-bc bpnn_d_to_c.d
opt -std-compile-opts bpnn_d_to_c.bc > bpnn_d_to_c_opt.bc
llvm-ld -native -ltango-base-ldc -ltango-user-ldc -ldl -lm -lpthread -internalize-public-api-list=_Dmain -o=bpnn_d_to_c bpnn_d_to_c_opt.bc | comments: Leave a comment  |
| I have written more version of the benchmark from this page:
PerformancePython: A comparison of weave with NumPy, Pyrex, Psyco, Fortran and C++ for solving Laplace's equation: http://scipy.org/PerformancePython
The zip with all the code: http://www.fantascienza.net/leonardo/js/laplace.zip
Contents of the zip: - perfpy: original code from the scipy site, plus copy of the Html page. - laplace_ss1.py: plain Python version, fit for ShedSkin. - laplace_ss2.py: refined version for ShedSkin, moved sublists out of the inner loop. - laplace_cpp.cpp: C++ version from the scipy site. This version is alerady the best for the C++ compiler. Lower level optimizations make the code slower. - laplace_d1.d: direct translation of the C++ to D 1 language. The D (DMD) compiler isn't as good as the C++ compiler, and the code is slower. - laplace_d2.d: refined D 1 version, the inner loop is written in a lower level style to help the D compiler. - laplace_psyco1.py: version fit for Psyco, like the the laplace_ss2.py code. - laplace_psyco2.py: refined version for Psyco, used a 1D array.array("d"), plus manual management of its 2D nature.
Timings nx=ny=500, n_iter=1000, eps=1.0e-16, warm timings, best of 3:
Python D: 459 s (163 X) estimated (code of Psyco 2)
Python A: 289 s (102 X) estimated (code of SS 1)
Python C: 210 s ( 74.7 X) estimated (code of Psyco 1)
Python B: 208 s ( 74.0 X) estimated (code of SS 2)
Psyco 1: 41.5 s ( 14.8 X)
Psyco 2: 37.47 s ( 13.3 X)
D 1: 3.95 s ( 1.40 X)
D 2: 3.13 s ( 1.11 X)
ShedSkin 1: 2.92 s ( 1.04 X)
ShedSkin 2: 2.89 s ( 1.03 X)
C++: 2.81 s ( 1 X)
The second version for D is fast enough, the first version for SS is very fast and easy to write, it's the same as the original Python version.
My CPU is Core2 at 2 GHz, WinXP 32 bit.
Compilers used: - G++ MinGW based on GCC V.4.2.1. - DMD V.1.036. - Python V.2.6.0. - Psyco V.1.6.0 final. - ShedSkin V.0.0.29.
Compilation arguments: - ShedSkin: -w, plus arguments in the make: -O3 -s -fomit-frame-pointer - G++: -O3 -s -fomit-frame-pointer - DMD: -O -release -inline
------------------------
Asm of the inner loop of LaplaceSolver.timeStep() generated by MinGW:
.p2align 4,,7
L38:
fldl -8(%edx,%eax,8)
fldl -8(%ebx,%eax,8)
faddl -8(%ecx,%eax,8)
fmul %st(4), %st
fldl -16(%edx,%eax,8)
faddl (%edx,%eax,8)
fmul %st(6), %st
faddp %st, %st(1)
fmul %st(3), %st
fstl -8(%edx,%eax,8)
addl $1, %eax
fsubp %st, %st(1)
cmpl %esi, %eax
fmul %st(0), %st
faddp %st, %st(1)
jne L38
Asm of the inner loop of LaplaceSolver.timeStep() generated by DMD:
LA9: fld qword ptr [ESI]
fstp qword ptr 038h[ESP]
fld qword ptr [EAX]
fadd qword ptr 0[EBP]
fmul qword ptr 028h[ESP]
fld qword ptr [EBX]
fadd qword ptr [EDI]
fmul qword ptr 020h[ESP]
faddp ST(1),ST
fmul qword ptr 030h[ESP]
fstp qword ptr [ESI]
fld qword ptr [ESI]
fsub qword ptr 038h[ESP]
fstp qword ptr 040h[ESP]
fld qword ptr 040h[ESP]
fmul ST,ST(0)
fadd qword ptr 048h[ESP]
fstp qword ptr 048h[ESP]
add ESI,8
add EBP,8
add EAX,8
add EDI,8
add EBX,8
inc EDX
cmp 010h[ESP],EDX
jg LA9
| comments: 1 comment or Leave a comment  |
| When you find a bug in a compiler (like the DMD compiler) and you submit the bug, you are usually encouraged to give the smaller and simpler program (often just about 5 lines long) that shows the same problem/bug. This helps locate the problem, and remove it faster. At the same way microbenchmarks too can be useful, despite being so little they can't really represent what goes on in normal sized programs. They can show you in the simpler way what's better to improve.
This tiny benchmark focus on GC performance and shows similar results to the "binary trees" benchmark I have shown here few weeks ago, but it's quite shorter, so it may be more useful. This code isn't meant to represent normal programs with thousands of classes, etc, it just shows something quite limited. This is an adaptation of the "Object Test" that you can find used and discussed here: http://www.twistedmatrix.com/users/glyph/rant/python-vs-java.html http://blog.snaplogic.org/?p=55 http://programming.reddit.com/info/24ynh/comments Note that I test the GC of Phobos only, I don't use Tango yet.
Object Test timings (on PIII @ 500 MHz), best of 3 runs (seconds, approximate Mbytes memory used), n=1_000, m=10_000:
seconds MB
DMD class: 18.95 1.7
GDC class: 17.91 1.8
DMD struct: 11.77 1.7
GDC struct: 12.31 1.8
Python: 37.10 3.1 (ShedSkin version)
Psyco: 15.68 3.5
ShedSkin 6.35 1.6
Java (1): 2.19 7.3
Java (2): 2.10 7.8
See below for the sources. You can see the Java is about 9 (~= 18.95 / 2.10) times faster than the program produced by DMD. Even Psyco (the JIT for the turtle-slow language Python) gives faster performance (and the Python GC is simple, it's just a reference count plus cycle detection). You can also see Java (as usual) uses much more RAM (7.8/1.7 ~= 4.6).
Note that with some manual GC optimization the running time for the DMD class benchmark can become about one-two seconds smaller, but the memory used can be about 4 MB:
import std.gc;
class ObjectTest { ObjectTest next; }
void main() {
for (uint k = 0; k < 50; k++) {
std.gc.disable();
for (uint i = 0; i < 20; i++) {
auto root = new ObjectTest();
for (uint j = 0; j < 10_000; j++) {
root.next = new ObjectTest();
root = root.next;
}
}
std.gc.fullCollect();
}
}
-------------------
COMPILED/RUN WITH: gcc version 3.4.5 (mingw special) (gdc 0.24, using dmd 1.020)
DMD v1.024
javac 1.6.0_03 java version "1.6.0_03"
Python 2.5.1
ShedSkin 0.0.25 (shedskin.sourceforge.net , a Python to C++ compiler)
-------------------
COMPILATION/RUNNING PARAMETERS: GDC used as: gdc -O3 -s -frelease -finline-functions -ffast-math -fomit-frame-pointer -funroll-loops -march=pentiumpro obj_test_class.d -o obj_test_class_gdc
DMD used as: dmd -O -release -inline obj_test_class.d -ofobj_test_class_dmd.exe
Java compiled with: javac ObjectTest.java
Java (1) run with: java ObjectTest
Java (2) run with: java -Xms20m -Xbatch ObjectTest
SS and Psyco compiled and run without flags.
-------------------
SOURCES:
// obj_test_class.d
class ObjectTest { ObjectTest next; }
void main() {
for (uint i = 0; i < 1_000; i++) {
auto root = new ObjectTest();
for (uint j = 0; j < 10_000; j++) {
root.next = new ObjectTest();
root = root.next;
}
}
}
-------------------
// obj_test_struct.d
struct ObjectTest { ObjectTest* next; }
void main() {
for (uint i = 0; i < 1_000; i++) {
auto root = new ObjectTest();
for (uint j = 0; j < 10_000; j++) {
root.next = new ObjectTest();
root = root.next;
}
}
}
-------------------
// ObjectTest.java
public class ObjectTest {
public ObjectTest next;
public static void main(String[] args) {
for (int i = 0; i < 1000; i++) {
ObjectTest root = new ObjectTest();
for (int j = 0; j < 10000; j++) {
root.next = new ObjectTest();
root = root.next;
}
}
}
}
Note that the HotSpot Java compiler FAQ warns to not use code like that: http://java.sun.com/docs/hotspot/HotSpotFAQ.html >if you insist on using/writing microbenchmarks like this, you can work around the problem by moving the body of main to a new method and calling it once from main to give the compiler a chance to compile the code, then calling it again in the timing bracket to see how fast HotSpot is.<
-------------------
# obj_test.py
from psyco.classes import __metaclass__
class ObjectTest: pass
def main():
for i in xrange(1000): # N
root = ObjectTest()
for j in xrange(10000): # M
root.next = ObjectTest()
root = root.next
import psyco; psyco.full()
main()
-------------------
# obj_test_ss.py
class ObjectTest: pass
def main():
for i in xrange(1000): # N
root = ObjectTest()
for j in xrange(10000): # M
root.next = ObjectTest()
root = root.next
main()
| comments: Leave a comment  |
| I have done (again) a speed test for string hashing in some languages (this program tests short strings, of len =~ 6). The results are quite interesting:
dict_speed timing results (PIII 500 MHz) for each language/situation: SS: 10.4 s SS -DNOWRAP: not important for this test SS with SuperFastHash: 10.4 s Python: 7.09 s Psyco: 4.74 s Boo: 3.43 s C: 3.15 s C with SuperFastHash: 3.03 s D: 2.23 s (Not a real hash!)
All the testing source code can be found here.
SuperFastHash can be found here: http://www.azillionmonkeys.com/qed/hash.html
Take a look at my post on Boo too. | comments: Leave a comment  |
| |