leonardo
View:Recent Entries.
View:Archive.
View:Friends.
View:User Info.
View:Website (My Website).
You're looking at the latest 3 entries.

Tags:, , , ,
Subject:Pythagorean triples
Time:06:05 pm
A little puzzle:
Given an array of n integers, find all Pythagorean triples in the array, that is, three elements such that a^2 + b^2 = c^2. Do this in O(n^2) time.

A simple solution is to test the equation with a hash of the squares on every unordered (a^2,b^2) pair (see the first Python version below).

First I have written a Python solution that uses a set, then I have translated it to D using my set module, then using D associative arrays, then I have translated the code to C adapting a simple hash code to integer keys. Then I have simplified the C code, removing the values from the hash.

That's the faster general solution, but the code can be improved if the problem is made less general. So I have tried to count the number of distinct Pythagorean triples among the integers [1,10_000].
So I have tried to simplify the hash some more. I have tried a really simple hash, and while counting the collision frequency I have found a very simple hash configuration where every number is present at its place in the array, or it's absent. Then as usual I have backported that code to Psyco and D.

Click to see the source code
Click here )

Using MinGW 4.2.1 I have compiled that C code with:
gcc -O3 -s -march=pentiumpro -fprofile-generate triples.c -o triples
Followed by a run, followed by:
gcc -O3 -s -march=pentiumpro -fprofile-use triples.c -o triples

The general D version works with any group of input numbers (not tested below).

The D1 version is really close to the C version. The D2 version is faster and higher-level, it uses my d libs:
www.fantascienza.net/leonardo/so/libs_d.zip

The timing results (Pentium3 500 MHz):
Timings triples, N = 10_000:
  D general: 13.38 s
  Psyco:      6.89 s   (4.12 X)   (1.87 X)
  D1:         4.02 s   (2.40 X)   (1.09 X)
  D2:         3.68 s   (2.20 X)   (1    X)
  C:          1.67 s   (1    X)   (0.45 X)

Triples in [1, 10000): 12467
Triples in [1, 20000): 27171
Triples in [1, 40000): 58728
Triples in [1, 80000): 394138

The timings are quite good for Psyco but quite bad for D: the C-like D version is 2.4 times slower than the C version, and the Psyco version is just 1.7 times slower than the C-like D version (and the Psyco version is 1.9 times slower than the faster D version). This means that with this tiny program there's not that much purpose in using D instead of Psyco.
comments: 1 comment or Leave a comment Add to Memories Tell a Friend

Tags:, , ,
Subject:Euler problem 14
Time:05:23 pm
The Euler problem n.14:

The following iterative sequence is defined for the set of positive integers:
n -> n/2 (n is even)
n -> 3n + 1 (n is odd)
Using the rule above and starting with 13, we generate the following sequence of 10 terms:
13 -> 40 -> 20 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1
Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1. Which starting number, under one million, produces the longest chain of Collatz numbers?


My Python solution, that uses memoization of results to speed up computation:
import array

def main(m):
  lens = array.array('H', [0]) * m
  max_steps = max_steps_pos = 1
  for i in xrange(1, m):
    n = i
    n_steps = 1
    while n != 1:
      if n < i:
        n_steps += lens[n] - 1
        break
      n = 3 * n + 1 if n & 1 else n // 2
      n_steps += 1
    if n_steps > max_steps:
      max_steps = n_steps
      max_steps_pos = i
    lens[i] = n_steps
  return max_steps, max_steps_pos

import psyco; psyco.bind(main)
print "max_steps, max_steps_pos:", main(1000000)

Psyco is often able to manage array.array very efficiently.
'H' represents unsigned shorts [0 .. 2^16-1], because the chains are never too much long. This allows to reduce memory used and improve CPU cache efficiency (mostly in the D version). In Python you can use 'l' that is safer and doesn't slow down the code much, but uses twice the memory (so you can manage smaller problems).

The D version is essentially the same, the only significant difference is that the array isn't pre-initialized:
import std.stdio, std.c.stdlib;

void main() {
  const uint m = 1_000_000;
  auto lens = (cast(ushort*)malloc(m * ushort.sizeof))[0 .. m];
  uint max_steps = 1;
  uint max_steps_pos = 1;

  for (uint i = 1; i < m; i++) {
    uint n = i;
    uint n_steps = 1;

    while (n != 1) {
      if (n < i) {
        n_steps += lens[n] - 1;
        break;
      }
      n = n & 1 ? 3 * n + 1 : n / 2;
      n_steps++;
    }
    if (n_steps > max_steps) {
      max_steps = n_steps;
      max_steps_pos = i;
    }
    lens[i] = n_steps;
  }
  writefln("max_steps, max_steps_pos: ", max_steps, " ", max_steps_pos);
}


Timings:
m = 1_000_000:
D/C:    0.29 s  ( 1   X)
Psyco:  1.60 s  ( 5.5 X) with array 'H'
Python: 19.20 s (66   X) with list

A C version is very similar to that D version, and the running time (with GCC 4.2.1) is the same, but the executable is almost 20 times smaller.

The D code scans m = 100_000_000 in 25 s finding a chain of length 758 on n= 83_706_505.

An Haskell version:
http://www.haskell.org/haskellwiki/Euler_problems/11_to_20#Problem_14

To improve cache efficiency I have tried to store half of the values, but the code is slower (it allows to scan twice the values with the same max memory).
comments: Leave a comment Add to Memories Tell a Friend

Tags:,
Subject:Speed of old and new style classes with Psyco and CPython
Time:01:26 pm
In the Voidspace Techie Blog there is a small discussion about the relative speed of IronPython old and new style classes, compared to CPython ones:
http://www.voidspace.org.uk/python/weblog/arch_d7_2006_12_02.shtml#e571

I have modified the test to compare CPython V.2.5 with and without Psyco:
N = 100000
print "Timing results (seconds) for N = %d:\n" % N

if 1:
    from psyco.classes import __metaclass__
    print "With PsycoMetaclasses:"
else:
    print "Without PsycoMetaclasses:"
from time import clock

class OldExample:
    def __init__(self):
        self.x = None
        self.y = None
        self.z = None
    def __call__(self):
        x = self.x
        y = self.y
        z = self.z

def oldStyleTiming():
    start = clock()
    for i in xrange(N):
        OldExample()()
    return round(clock() - start, 2)


class NewExample(object):
    def __init__(self):
        self.x = None
        self.y = None
        self.z = None
    def __call__(self):
        x = self.x
        y = self.y
        z = self.z

def newStyleTiming():
    start = clock()
    for i in xrange(N):
        NewExample()()
    return round(clock() - start, 2)

print "  Without Psyco:"
print '    Old Style classes:', oldStyleTiming()
print '    New Style classes:', newStyleTiming()

if 0:
    print "  With Psyco (psyco.full()):"
    import psyco
    psyco.full()
else:
    print "  With Psyco (4 bind):"
    import psyco
    psyco.bind(OldExample)
    psyco.bind(NewExample)
    psyco.bind(newStyleTiming)
    psyco.bind(oldStyleTiming)

print '    Old Style classes:', oldStyleTiming()
print '    New Style classes:', newStyleTiming()




Timing results (seconds) for N = 100000:

Without PsycoMetaclasses:
  Without Psyco:
    Old Style classes: 1.48
    New Style classes: 1.44
  With Psyco (psyco.full()):
    Old Style classes: 1.93
    New Style classes: 0.95
  With Psyco (4 bind):
    Old Style classes: 2.16
    New Style classes: 1.11

With PsycoMetaclasses:
  Without Psyco:
    Old Style classes: 1.72
    New Style classes: 1.4
  With Psyco (psyco.full()):
    Old Style classes: 1.21
    New Style classes: 0.97
  With Psyco (4 bind):
    Old Style classes: 0.77
    New Style classes: 1.1

Summary: for this little test the faster combination is with Psyco (used with binds) with PsycoMetaclasses and old-style classes.
comments: Leave a comment Add to Memories Tell a Friend

leonardo
View:Recent Entries.
View:Archive.
View:Friends.
View:User Info.
View:Website (My Website).
You're looking at the latest 3 entries.