leonardo ([info]leonardo_m) wrote,
@ 2009-08-05 17:33:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
Current location:programming
Entry tags:benchmark, c++, d language, g++, ldc, llvm-g++, programming

Sphereflake ray-tracing benchmark in C++ and D
Timings of a small ray-tracing "Spereflake" benchmark (you can find a copy of this Html page in the references directory of the zip too):
http://ompf.org/ray/sphereflake/



The D-LDC results of this benchmark are good. I have also created a faster D version where the output is now in P5 pgm format (bytes are represented as single chars, this speeds up the output) and where the trascendental values are precomputed (G++ 4.3.3 is able to pre-compute them, while the current LDC isn't able to).

A possible way to further speed up the code: when lvl=8 it creates 5_380_840 visible spheres and node_t, but it needs only 597_871 bounding spheres (the leaves don't need such bounding spheres and most of the nodes of a tree are leaves).

So it may be useful to split the array of node_t into two arrays, one much larger that just contains the visible spheres and the skip pointer, and one smaller that contains the bounding spheres (and maybe another skip pointer). In the end the CPU cache has to manage two arrays, so the program may be slower.

With that idea with lvl=8 the memory used becomes 205 MB (or 228 MB if doubles are aligned to 8 bytes) instead of the current 369 MB, this lowers the CPU cache traffic.

I have 2 GB RAM on my PC, so I can't run it with lvl=9. But removing just visible sphere radiuses it needs less than 2 GB RAM:
( 48427561 * (8 + 3 * 8) + 5380840 * (8 + 4 * 8) ) / (1024*1024) = 1683.15
Timings on Windows, w=h=1024, lvl=6, best of 3, seconds:
  C++:  4.60  (claiming 4.6 MB) (+4 bytes padding)
  C++:  4.60  (claiming 4.6 MB)
  C++:  4.63  (claiming 4.6 MB) (+4 bytes padding, LLVM-G++)
  C++:  4.64  (claiming 4.6 MB) (LLVM-G++)
  C++:  4.68  (claiming 4.6 MB) (+4 bytes padding, PGO)
  D:   11.78  (claiming 4.6 MB) (DMD v1.043, with "ref", fast version)
  D:   17.34  (claiming 4.6 MB) (DMD v1.043, with "ref")
  D:   17.34  (claiming 4.6 MB) (DMD v1.043, with "ref", no GC)
  D:   28.24  (claiming 4.6 MB) (DMD v2.031, with no "ref")
  D:   28.27  (claiming 4.6 MB) (DMD v2.031, with no "ref", gs)
  D:   29.78  (claiming 4.6 MB) (DMD v1.043, with no "ref")
  D:   29.79  (claiming 4.6 MB) (DMD v1.046, with no "ref")
(Those DMD-Windows timings are not reliable, I have timed it as low ad 12.5 seconds)


Timings on Pubuntu, w=h=1024, lvl=6, best of 3, seconds:
(66_430 spheres, WITH_SHADOWS=true, FASTER_LDC=true)
  D:    4.81  (claiming 4.6 MB) (with fast output)
  C++:  4.86  (claiming 4.6 MB) (+4 bytes padding)
  D:    4.91  (claiming 4.6 MB) (fast version)
  D:    4.93  (claiming 4.6 MB)
  C++:  5.75  (claiming 4.3 MB)

Timings on Pubuntu, w=h=1024, lvl=8, best of 3, seconds:
(5_380_840 spheres)
  C++: 25.00  (claiming 369 MB) (+4 bytes padding)
  D:   25.27  (claiming 369 MB) (fast version)
  D:   25.62  (claiming 369 MB) (with "ref")

Key:
no GC = Garbage collector disabled in the whole running
+4 bytes padding = 4 bytes of padding added to the Node struct
PGO = profile-guided optimization
gs = all global variables are annotated with __gshared

Args:
g++ -O3 -s -fomit-frame-pointer -msse3 -march=native -ffast-math
llvm-g++ -O3 -s -fomit-frame-pointer -msse3 -march=native -ffast-math
dmd -O -release -inline
ldc -O5 -release -inline


All the tested code:
http://www.fantascienza.net/leonardo/js/sphereflake.zip



(4 comments) - (Post a new comment)


(Anonymous)
2009-08-06 02:56 pm UTC (link)
Is it hard to make a Java version of the benchmark. I'm not that interested in Java, but like to know how much JVM can slow down things.

(Reply to this) (Thread)


[info]leonardo_m
2009-08-06 03:04 pm UTC (link)
Why do you think it's hard to translate this code in Java?
(I think the array of nodes has to become an array of references, to its repeated traversal may be slower compared to the traversal of the struct array.)

(Reply to this) (Parent)(Thread)


(Anonymous)
2009-08-06 03:52 pm UTC (link)
If it's not hard, the Java results could be included in the benchmark. :)

(Reply to this) (Parent)(Thread)


[info]leonardo_m
2009-08-06 04:20 pm UTC (link)
Please, write the Java version, and I'll add the timings.

(In the meantime I have seen that the fast D version doesn't work correctly on Windows for the damning problem of newlines, because putchar() doesn't print binary data correctly. I'll fix the code, but there is no simple solution in D in Windows Phobos).

(Reply to this) (Parent)


(4 comments) - (Post a new comment)

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…