| ||||||||
With few speed tests with a friend I have seen than in a simple program that creates an (unsorted) Associative Array, C# 3.0 of dotnet is quite faster than the AAs built in the D language (despite C# running on a virtual machine). Probably the AAs and GC of D are less refined still, they will need more tuning. So I've done more tests to see if the Garbage Collector has some responsibilities, I've just disabled it (the D GC API is very similar to Python one):
import std.string: toString;
import std.stdio: writefln;
import std.conv, std.c.time, std.gc;
double clock() {
auto t = std.c.time.clock();
if (t == -1)
return 0.0;
else
return cast(double)(t) / cast(double)std.c.time.CLOCKS_PER_SEC;
}
void main(string[] argv) {
if (argv.length == 3) {
uint n = std.conv.toUint(argv[1]);
bool disabled = argv[2] == "0";
if (disabled)
std.gc.disable();
writefln("n=", n, disabled ? ", GC disabled" : ", GC enabled");
auto t = clock();
uint[string] aa;
for(uint i; i < n; i++)
aa["hello_" ~ toString(i)] = i;
writefln(clock() - t);
if (n <= 100) writefln(aa);
}
}
The timing results on my very old PC, all timings are in seconds. The first value is the one printed by the code, the second one is the one given by a timing program that counts the whole running time, the final memory deallocation time too: >elaps AA_speed_test.exe 2000000 0 n=2000000, GC disabled 15.031 17.88elapsed >elaps AA_speed_test.exe 2000000 1 n=2000000, GC enabled 30.394 0:32.50elapsed Such tests use about 4/5 of the physical RAM (about 200 MB). I have done other tests with different n, and it seems the GC produces a superlinear slowdown. Disabling the GC the program requires about 20% more RAM (the difference is probably in the memory of the strings generated by toString()). As you can see without the GC the program runs about two times faster, probably reaching about the speed of C# (but probably there is a way to disable the C# GC too). D has the advantage of being a multi-level language, so you can often "go down one level", in the following code I have modified the central loop, it now uses a sprintf():
import std.string: toString;
import std.stdio: writefln;
import std.conv, std.c.time, std.gc;
import std.c.stdio: sprintf;
double clock() {
auto t = std.c.time.clock();
if (t == -1)
return 0.0;
else
return cast(double)(t) / cast(double)std.c.time.CLOCKS_PER_SEC;
}
void main(string[] argv) {
if (argv.length == 3) {
uint n = std.conv.toUint(argv[1]);
bool disabled = argv[2] == "0";
if (disabled)
std.gc.disable();
writefln("n=", n, disabled ? ", GC disabled" : ", GC enabled");
auto t = clock();
uint[string] aa;
char[15] key;
for(uint i; i < n; i++) {
auto nc = sprintf(key.ptr, "hello_%d", i);
aa[key[0 .. nc].dup] = i;
}
writefln(clock() - t);
if (n <= 100) writefln(aa);
}
}
That avoids the heap allocations done by toString(), you can see that because disabling the GC the total allocated memory doesn't increase more. The timings: >elaps AA_speed_testC.exe 2000000 0 n=2000000, GC disabled 13.429 0:15.46elapsed >elaps AA_speed_testC.exe 2000000 1 n=2000000, GC enabled 24.986 0:27.06elapsed So even if it allocates less RAM, the GC has the same bad behavior. Using other low level tricks you can go even faster, you can replace the middle of the program with this code:
uint[string] aa;
char[15] key = "hello_";
for(uint i; i < n; i++) {
auto nc = sprintf(key.ptr+6, "%d", i);
aa[key[0 .. 6+nc].dup] = i;
}
With that the sprintf() just converts the number i (and adds \0), not touching the "hello_" prefix, that later I copy anyway with the slice (with a dup to produce a true copy. Slices in D 1.x are just light interval references, and they can't be extended). The new timings: >elaps AA_speed_testC2.exe 2000000 0 n=2000000, GC disabled 12.658 0:14.69elapsed >elaps AA_speed_testC2.exe 2000000 1 n=2000000, GC enabled 24.205 0:26.24elapsed When you want D allows you to program as in C too (or even in Assembly), but when you do it you have to watch for the usual traps and bugs typical of the C programming. D GC and AAs need more love and tuning, you can see it with the following two tiny tests too, that compare two quite similar programs that use associative arrays, Python (+ Psyco) agaist D:
def main():
d = {}
for i in xrange(500000):
d["hello_" + str(i)] = i
import psyco; psyco.full()
main()
import std.string;
void main() {
int[string] d;
for(int i; i < 500_000; i++)
d["hello_" ~ toString(i)] = i;
}
On my PC the Psyco version is faster (note that Python integers are multiprecision, and the python dict too is dynamically typed). | ||||||||
| comments: Leave a comment |