| leonardo ( @ 2009-08-05 17:33:00 |
| Current location: | programming |
| Entry tags: | benchmark, c++, d language, g++, ldc, llvm-g++, programming |
Sphereflake ray-tracing benchmark in C++ and D
Timings of a small ray-tracing "Spereflake" benchmark (you can find a copy of this Html page in the references directory of the zip too):
http://ompf.org/ray/sphereflake/

The D-LDC results of this benchmark are good. I have also created a faster D version where the output is now in P5 pgm format (bytes are represented as single chars, this speeds up the output) and where the trascendental values are precomputed (G++ 4.3.3 is able to pre-compute them, while the current LDC isn't able to).
A possible way to further speed up the code: when lvl=8 it creates 5_380_840 visible spheres and node_t, but it needs only 597_871 bounding spheres (the leaves don't need such bounding spheres and most of the nodes of a tree are leaves).
So it may be useful to split the array of node_t into two arrays, one much larger that just contains the visible spheres and the skip pointer, and one smaller that contains the bounding spheres (and maybe another skip pointer). In the end the CPU cache has to manage two arrays, so the program may be slower.
With that idea with lvl=8 the memory used becomes 205 MB (or 228 MB if doubles are aligned to 8 bytes) instead of the current 369 MB, this lowers the CPU cache traffic.
I have 2 GB RAM on my PC, so I can't run it with lvl=9. But removing just visible sphere radiuses it needs less than 2 GB RAM:
( 48427561 * (8 + 3 * 8) + 5380840 * (8 + 4 * 8) ) / (1024*1024) = 1683.15
Timings on Windows, w=h=1024, lvl=6, best of 3, seconds: C++: 4.60 (claiming 4.6 MB) (+4 bytes padding) C++: 4.60 (claiming 4.6 MB) C++: 4.63 (claiming 4.6 MB) (+4 bytes padding, LLVM-G++) C++: 4.64 (claiming 4.6 MB) (LLVM-G++) C++: 4.68 (claiming 4.6 MB) (+4 bytes padding, PGO) D: 11.78 (claiming 4.6 MB) (DMD v1.043, with "ref", fast version) D: 17.34 (claiming 4.6 MB) (DMD v1.043, with "ref") D: 17.34 (claiming 4.6 MB) (DMD v1.043, with "ref", no GC) D: 28.24 (claiming 4.6 MB) (DMD v2.031, with no "ref") D: 28.27 (claiming 4.6 MB) (DMD v2.031, with no "ref", gs) D: 29.78 (claiming 4.6 MB) (DMD v1.043, with no "ref") D: 29.79 (claiming 4.6 MB) (DMD v1.046, with no "ref") (Those DMD-Windows timings are not reliable, I have timed it as low ad 12.5 seconds) Timings on Pubuntu, w=h=1024, lvl=6, best of 3, seconds: (66_430 spheres, WITH_SHADOWS=true, FASTER_LDC=true) D: 4.81 (claiming 4.6 MB) (with fast output) C++: 4.86 (claiming 4.6 MB) (+4 bytes padding) D: 4.91 (claiming 4.6 MB) (fast version) D: 4.93 (claiming 4.6 MB) C++: 5.75 (claiming 4.3 MB) Timings on Pubuntu, w=h=1024, lvl=8, best of 3, seconds: (5_380_840 spheres) C++: 25.00 (claiming 369 MB) (+4 bytes padding) D: 25.27 (claiming 369 MB) (fast version) D: 25.62 (claiming 369 MB) (with "ref")
Key:
no GC = Garbage collector disabled in the whole running
+4 bytes padding = 4 bytes of padding added to the Node struct
PGO = profile-guided optimization
gs = all global variables are annotated with __gshared
Args:
g++ -O3 -s -fomit-frame-pointer -msse3 -march=native -ffast-math
llvm-g++ -O3 -s -fomit-frame-pointer -msse3 -march=native -ffast-math
dmd -O -release -inline
ldc -O5 -release -inline
All the tested code:
http://www.fantascienza.net/leonard