leonardo ([info]leonardo_m) wrote,
@ 2009-02-07 21:45:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
Entry tags:benchmark, c, d language, programming, python

Multiprecision integers - part 2
This time I benchmark small sums among multiprecision integers (MPI). This kind of operation is interesting because it shows how well such multiprecision integers can replace normal integer operations, to avoid integer overflows but keep running efficiency as much as possible. Among the ones I have tested the Python integral numbers seem the most fit for this purpose. Probably the libs I have tested aren't much fit for this. Probably the results for Python3 are worse (even when Psyco will be available for it).

TIMINGS, N = 1_000_000 * 10, best of 3:

#program, seconds run:
#2  D:        0.027 s (LDC)
#1  Python:   0.34  s (+ Psyco)
#5  C:        1.04  s (+ GMP)
#3  Python:  11.7   s (+ Psyco + GMPY)
#4  D:       21.2   s (LDC + Tango)
#6  C++:              (+ GMP)
#7  Java:

PROGRAM OUTPUT: 45000000

-----------------

UPDATE Feb 11 2009:

Indeed, the LLVM backend is able to optimize away the loops of
 the D program #2, this is a version with printing:

import tango.stdc.stdio: printf;
void main() {
    int n = 0;
    for (int j; j < 1_000_000; j++)
        for (int i; i < 10; i++)
            n += i;
    printf("%d\n", n);
}

Compiling with -O2 -inline -release, such extreme optimization
 isn't performed, this is the asm generated:

	subl	$12, %esp
	xorl	%eax, %eax
	movl	%eax, %ecx
	.align	16
.LBB1_1:	# forcond
	cmpl	$999999, %eax
	jg	.LBB1_6	# endfor
.LBB1_2:
	xorl	%edx, %edx
	.align	16
.LBB1_3:	# forcond2
	cmpl	$9, %edx
	jg	.LBB1_5	# forinc
.LBB1_4:	# forbody3
	addl	%edx, %ecx
	incl	%edx
	jmp	.LBB1_3	# forcond2
.LBB1_5:	# forinc
	incl	%eax
	jmp	.LBB1_1	# forcond
.LBB1_6:	# endfor
	movl	%ecx, 4(%esp)
	movl	$.str1, (%esp)
	call	printf

But the total running time is minimal anway, 0.027 s or less, such
 timign is lost in noise.

In all the other programs the compilers/interpreters can't optimize
 away the loops, mostly because they use multiprecision integers.

-----------------

PROGRAMS:

# program #1, Python with Psyco
import psyco
def main():
    n = 0
    for j in xrange(1000000):
        for i in xrange(10):
            n += i
    return n   
psyco.bind(main)
main()



// program #2, D
void main() {
    int n = 0;
    for (int j; j < 1_000_000; j++)
        for (int i; i < 10; i++)
            n += i;
}



# program #3, Python with Psyco, gmpy
import psyco
from gmpy import mpz
def main():
    n = mpz(0)
    for j in xrange(1000000):
        for i in xrange(10):
            n += i
    return n
psyco.bind(main)
main()



// program #4, D with BigInt
import tango.math.BigInt: BigInt;
void main() {
    BigInt n = "0";
    for (int j; j < 1_000_000; j++)
        for (int i; i < 10; i++)
            n += i;
}



// program #5, C with GMP
#include "stdio.h"
#include "gmp.h"
int main(void) {
    mpz_t n;
    int i, j;
    mpz_init_set_si(n, 0);
    for (j = 0; j < 1000000; j++)
        for (i = 0; i < 10; i++)
            mpz_add_ui(n, n, i);
    mpz_out_str(0, 10, n);
    putchar('\n');   
    return 0;
}



// program #6, C++ with GMP (not tested)
#include <iostream>
#include <gmpxx.h>
int main() {
    mpz_class n;
    int i, j;
    n = 0;
    for (j = 0; j < 1000000; j++)
        for (i = 0; i < 10; i++)
            n += i;   
    std::cout << n << "\n";
    return 0;
}



// program #7, Java (not tested)
import java.math.BigInteger;
public class bigint_small_sum_7 {
    public static void main(String[] args) {
        BigInteger n = BigInteger.ZERO;
        for (int j = 0; j < 1000000; j++)
            for (int i = 0; i < 10; i++)
                n = n.add(BigInteger.valueOf(i));
    }
}

-----------------

CPU: Pentium3 @ 500 MHz, 256 MB RAM
OS: Ubuntu 8.10.

Timings are "real" seconds found with the 'time' command
(the best from 3+ consecutive runs).

VM/compiler/libs used:
- Python V.2.5.2
- Psyco V.1.6.0
- GMPY V.1.02
- GMP V.4.2.4
- gcc V.4.3.2 (Ubuntu 4.3.2-1ubuntu12)
- ldc rev.939, on dmd v1.039 + llvm 2.4 (Feb 4 2009)
- java 1.6.0_10

Code compiled with:
ldc -O3 -release -inline bigint_small_sum_2.d
ldc -O3 -release -inline bigint_small_sum_4.d  
gcc -O3 -s bigint_small_sum_5.c -o bigint_small_sum_5 -lgmp
g++ bigint_small_sum_6.cpp -o bigint_small_sum_6 -lgmpxx -lgmp
javac bigint_small_sum_7.java



(Read 4 comments) - (Post a new comment)

This benchmark is bad
(Anonymous)
2009-02-09 10:30 pm UTC (link)
The following haskell program counts numbers from 1 to infinity and does it in 0 second. It will beat any of yours implementations. I won :)

let a = sum [1..]

(It would be "int main () { int i=0; while (true) {i++} } " in C)

The variable "a" really contains the sum, just with one small issue -- if you would like to print it (btw. same as in programs 1 and 2) it will take a time. An average compiler can optimize the cycles if they are not needed.

See http://shootout.alioth.debian.org/ for better ones.

(Reply to this) (Thread)

Re: This benchmark is bad
[info]leonardo_m
2009-02-10 12:55 am UTC (link)
This is a very simple benchmark, and I agree it's not much good.

But I think it's good enough for its purpose, and has already uncovered a possible bug. You have to understand that the purpose of this code isn't to test Haskell, but mostly to test the multiprecision ints of the Tango lib of the D language. For such language this benchmark has a meaning. "Winning" is meaningless here. Of course your C code is wrong, because it's an infinite loop, while the Haskell code probably doesn't go into an infinite loop. And C has fixed-width numbers, etc.

I have tested my code with a printing statement, and it seems no compilers among the ones I have tested perform those optimizations in this code, because most of those benchmarks use multiprecision integers. The benchmark 2 may indeed be wrong because the compiler may optimize the loop away. (From testing it seems it doesn't happen here, maybe because there are two loops).

Benchmarks are very tricky things, it's usually easy to spot mistakes, errors, etc. And most they lack a statistical analysis of the results, that is necessary in all scientific publications.

Regarding the Shootout, I can indeed translate the pidigits benchmark in D with the BigInt and take a look at the results. But it mixes all operations with small and large numbers. While the purpose of my benchmark was to test just the + among small numbers, in multiprecision.

(Reply to this) (Parent)(Thread)

Re: This benchmark is bad
(Anonymous)
2009-02-10 05:18 pm UTC (link)
Please repeat first two benchmarks with putting at least print(main()) to end.

(Reply to this) (Parent)(Thread)

Re: This benchmark is bad
[info]leonardo_m
2009-02-10 06:29 pm UTC (link)
Yes, I'll soon time some of those things again with a printing, etc.

(Reply to this) (Parent)


(Read 4 comments) - (Post a new comment)

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…