?

Log in

No account? Create an account

Ant System Optimization for TSP - leonardo
View:Recent Entries.
View:Archive.
View:Friends.
View:Profile.
View:Website (My Website).

Tags:, , , ,
Security:
Subject:Ant System Optimization for TSP
Time:10:49 pm
Recently Eric Rollins in a series of blog posts has shown various implementations of an Ant-system algorithm to solve the TSP (Travelling salesman problem), this is the page of the Python+Psyco version:

http://eric_rollins.home.mindspring.com/pythonAnt.html
http://proliferationofniches.blogspot.com/2008/07/ant-colony-optimization-for-tsp-in.html

The Python code shown there isn't much pythonic, so I have improved that code, later I have created a faster Psyco version, I have translated to code to the D and I have created a faster D version that has allowed me to improve the fast Psyco version too. Then I have slowly converted the fast D version to more and more C-looking code, and I have converted it to C, to see the performance differences across the various versions, with interesting results.

Here you can find all the Python/D/C code of this problem:
http://www.fantascienza.net/leonardo/js/ant_tsp.zip

The zip file contains:
- ant1.py: first Python version, it's essentially a more pythonic version of the original.
- ant2.py: second Python version, with few small changes, optimized for speed.
- ant_1d.d: first D version, the closer to the first Python version.
- ant_2d.d: second D version, optimized for speed. Not many optimizations. Quite similar to the second Python version.
- ant_3d.d: third D version, the closer to C style, that I have used to create the C version (I have slowly converted the 2d version to C).
- ant_c.c: C version, the faster one.
- ant_orig.py: the original Python code by Eric Rollins, as reference.
- notes.txt: a document with the comments you are reading.

The exe files are compiled for Windows.

This article of mine helps to program in a more pythonic style:
http://www.fantascienza.net/leonardo/ar/python_best_practices.html

To compile the D code I have used the DMD compiler (or you can use the GDC compiler on Linux):
http://www.digitalmars.com/d/download.html
I have used one of the lasts DMD (V.1.029) but not the last one because this produces smaller executables).
My D code uses my d libs too, that allow a style of programming quite similar to Python:
http://www.fantascienza.net/leonardo/so/libs_d.zip
I have compiled the D code with:
-O -release -inline

To run Python I have used Python V.2.5.1, plus Psyco V.1.5.2.

To compile the C code I have used MinGW version 4.2.1-dw2, release 3.9, 5/19/2008, found here:
http://nuwen.net/mingw.html
To compile the C version I have used profile-guided optimization:
gcc -Wall -O3 -s -fprofile-generate ant_c.c -o ant_c
followed by a run, followed by:
gcc -Wall -O3 -s -fprofile-use ant_c.c -o ant_c

All tests are performed on a Dual Core, 2 GHz, with 1 MB L2 cache, 2 GB RAM, with 32-bit Win (but the CPU is 64 bit).
The C and D versions are compiled to 32 bit code.
I have not tried yet to port the D code to multithreading to test how it performs on the 2-core CPU (I think the purpose of this whole translation exercise by Eric Rollins was to test parallel code, so my versions miss all this point).
All the optimizations I have performed are small-scale, I have not changed the algorithm at all from the original Python version.
Note that all this Python code of mine is strictly optimized for Psyco, I haven't tried to optimize it for pure Python.
I have used the same parameters:
  seed = 1
  boost = 5
  iter = 1000
  numCities = 200
  maxDistance = 100
  cityDistanceSeed = 1

Timings, best of 3:
  Python orig: 6.59 s
  Python V.1:  6.13 s
  Python V.2:  3.65 s
  D V.1:       2.68 s
  D V.2:       0.63 s
  D V.3:       0.67 s
  C:           0.46 s

Timings, best of 3, iter = 4000, numCities = 800:
  D V.2:      44.6  s
  C:          36.6  s

Lines of code:
  Psyco original: 91
  Psyco v.1:      77
  Psyco v.2:      85
  D v.1:          91
  D v.2:          97
  D v.3:         152  
  C:             154

File/executable sizes:
  4.166 ant1.py
  4.375 ant2.py
155.164 ant_1d.exe
153.116 ant_2d.exe
 92.188 ant_3d.exe
  8.733 ant_c.exe
  4.841 ant_orig.py 
It seems the second D version is about 2.3-2.4 times faster than the ML version, the C version is faster still.

The first D version looks a lot like Python, thanks to the D language itself and my libs, it contains lines like:
foreach (city; xrange(len(cities)))

while in the second D version I have used a normal for(), that's a bit faster:
for (int city; city < cities.length; city++)

In D version 2.x you can do the following, that's as fast as a traditional for:
foreach (city; 0 .. cities.length)

The first D version is a translation of the Python code, it uses some constructs like xmap(), etc that are a bit slower than low-level C-like code, but they allow to shorten code, to program faster, and to put less bugs in the code.
To speed up the code, and produce the second D version I haven't translated the whole D code to C-style code, but I have profiled the code (with -profile) and I have changed only few things in critical spots, speeding up the whole program a lot, and keeping most of high-level style code. This a compromise that I like (part of the speedup comes from using a boolean array to represent the set, I have copied this from the ML version).

I have seen some interesting things:

It's quite useful to translate programs from different languages, for example once I have created the optimized D version I can see that most or all the optimizations good for D are good for Psyco too. And finding those optimizations when I program in Python seem difficult still for my brain.

The original Python program has some problems, performance problems too, but spotting them isn't easy, even using a profiler.

The original Python version (maybe coming from the ML version) has a type signature inside a comment over every function, like:
# Matrix * Path -> double
def pathLength(cities, path): ...
This kind of annotation is never done in the Python world, but I have seen that such annotations are useful to translate the code to a statically typed language like D. ShedSkin is a Python to C++ compiler that as side effect spits out a Python code with similar annotations that can be useful.

It's very easy to translate this original Python code to the Python-looking D code, the resulting D code (the first D version) is as long as the Python code, and I have used only 10-15 minutes to perform such translation of about 90 lines of code and refine it. This is quite positive, and the (first) D version is twice faster than the Psyco code. During such translation the only part of the Python code that has required me a bit of time is the following (note that the Python code is optimized for Psyco, so inside the sum() there isn't a lazy generator, that's slower with Psyco):
def wrappedPath(path):
    return path[1:] + [path[0]]

def pathLength(cities, path):
    return sum([cities[r][c] for r, c in zip(path, wrappedPath(path))])

def updatePher(pher, path, boost):
    for r, c in zip(path, wrappedPath(path)):
        pher[r][c] += boost
That I have translated in two different ways until I have translated it to this D code:
double pathLength(Matrix cities, Path path) {
    return sum( xmap((int r, int c) {return cities[r][c];}, path, leftRotate(path)) );
}

void updatePher(Matrix pher, Path path, int boost) {
    foreach (r, c; xzip(path, leftRotate(path)))
        pher[r][c] += boost;
}
xmap an xzip are lazy, like imap and izip of Python, they use less memory, so they sometimes are faster too.

The original Python version of this part was:
def wrappedPath(path):
    return path[1:] + [path[0]]

def zip(list1, list2):
    res = []
    for i in range(len(list1)):
        res.append((list1[i], list2[i]))
    return res

# Matrix * Path -> double
def pathLength(cities, path):
    pairs = zip(path, wrappedPath(path))
    return sum([cities[r][c] for (r,c) in pairs])

# Matrix * Path * int -> unit
def updatePher(pher, path, boost):
    pairs = zip(path, wrappedPath(path))
    for (r,c) in pairs:
        pher[r][c] = pher[r][c] + boost

So thanks to the functional libs translating the Python code to D is easy and fast.

Then with the built-in DMD profiler I have profiled the first D version, and in about 15-20 minutes I have created the second D version that is much faster (later, when I have slowly translated the D version to C I have improved the second D version a bit more).

The translation to C (done on the D version, testing if the result was correct each small change, and testing the performance too at the same time, the result is the third D version, that looks like C, and shows that you can program in D almost as C) was much slower and more painful for me, it has required me about 200+ minutes. I have had to remove all the functional stuff, plus many other nice things present in D. At the end (probably mostly thanks to the GCC compiler, whose optimization capabilities sometimes look awesome) the C version is faster than the second D version, but I think for this specific program doing the translation to C isn't worth the efforts required to me.

Even if the D language looks like C, it's much more handily, if you take a look at my translation timings, you can see that while translating that Python code to D was a breeze, translating it to C was much slower and painful. Just the D garbage collector changes the situation a lot. The D code is often slower than the C one, but for many situations it's not more than 2 times slower than C, and the coding speed can be much higher, much safer and quite more fun.
Here I have collected some simple situations where the D code is too much slow:
http://www.fantascienza.net/leonardo/js/slow_d.zip

In C the code of those two small functions is (I have managed the matrices as a single 1-D array, managing the lines manually, but a more standard way can be used):
double pathLength(double* cities, int cities_side, int* path, int path_len) {
    double tot = 0.0;
    int i;
    for (i = 0; i < path_len-1; i++)
        tot += cities[path[i] * cities_side + path[i+1]];
    tot += cities[path[path_len-1] * cities_side + path[0]];
    return tot;
}

void updatePher(double* pher, int pher_side, int* path, int path_len, int boost) {
    int i;
    for (i = 0; i < path_len-1; i++)
        pher[path[i] * pher_side + path[i+1]] += boost;
    pher[path[path_len-1] * pher_side + path[0]] += boost;
}
If you like computer languages there are two other languages to test:
- The Genie language, that I think it looks very interesting:
http://live.gnome.org/Genie
Here you can download the Vala compiler that's able to compile Genie code too (I have never used Genie so far):
http://freshmeat.net/projects/vala/?branch_id=65428&release_id=278956
- ShedSkin, able to compile a subset of Python to C++, so probably the Python code can be used with small changes (expecially my second version):
http://code.google.com/p/shedskin/
comments: Leave a comment Previous Entry Share Next Entry

Ant System Optimization for TSP - leonardo
View:Recent Entries.
View:Archive.
View:Friends.
View:Profile.
View:Website (My Website).