?

Log in

No account? Create an account

Algorithms, Pascal, Python, Java, D and more - leonardo
View:Recent Entries.
View:Archive.
View:Friends.
View:Profile.
View:Website (My Website).

Security:
Subject:Algorithms, Pascal, Python, Java, D and more
Time:09:36 am
In the past Pascal was often used to teach the basics of programming, while today Java is often used for such purpose, or sometimes Python. Books about algorithmics too have to use something to represent algorithms and ideas. Such formalism has to be readable, unambiguous, and it must be easy to translate/implement it in various languages, even many years after the publication of the book (and in 15-20 years programming languages die or change a lot).
A book about algorithms has to take a bit of care of efficiency too, avoiding unnecessary inefficiencies. Helping the job of the programmer that later has to write efficient code.

Python is good to represent algorithms (for example in a book) because:
- Significant indentations is both unambigous and very clean. It avoids useless noise.
- The code is uncluttered and readable.
- It has handy built-ins and data structures. Their semantic is clean, so it's not too much difficult to understand what it does to later translate it to other languages.
- There are many available modules, in its standard library too, that help shorten code.
- Dynamic typing allows to define generic functions/methods, that naturally work on different kinds of input. So such functions/methods can be used for many different purposes with little or no change.

But Python is also bad to represent/write algorithms because:
- Arguments of functions/methods don't have a static type that can be seen in the code. So when you read an algorithm written in Python (and/or you want to translate it to other languages; and sooner or later lot of code gets translated, because often new languages come in existence and old languages die) you may need to follow the types into the code across functions. This is a boring and error-prone work. (ShedSkin is a Python ==> C++ translator that is able to infer types, and it outputs a type-annotated Python code that can be used to help such translation work to other statically typed languages. Another simpler solution (it can be just 50 lines of Python long) is to use a profiler that outputs annotated code of the measured argument types).
- In Python code you can add notes and comments to tell the types, but such information may go out of sync compared to the working code. And it's not good. Python doctests can be useful to understand what kind of arguments function/methods need.
- Generally Python gives a lot of freedom to the programmer, so you need a lot of self-discipline to program it, to write good code. Otherwise the resulting code is a mess. I have seen a lot of bad Python code in the last years.
- The current CPython interpreter optimizes nearly nothing. So if you want to write efficient code you may need to micro-optimize the code (for example in the core of a very hot loop) manually. This is slow, error-prone, and the resulting code is less readable than equal code written in D, where you can rely on the compiler to optimize code. (Note that ShedSkin code gets compiled by a compiler, so it gives you part of such freedom).

This is an example of Python (for Psyco) code that I have had to optimize manually (this code is untested and it may contain bugs, it's just for show):
len_circles = len(circles)
for i in xrange(len_circles-1):
    circle1 = circles[i]
    circle1_x = circle1.x
    circle1_y = circle1.y
    circle1_radius_padded = circle1.radius + padding
    for j in xrange(i+1, len_circles):
        circle2 = circles[j]

        #inlined for speed
        #d_d = sqr_distance(circle1_x, circle1_y, circle2.x, circle2.y)
        dx = circle2.x - circle1_x
        dy = circle2.y - circle1_y
        d_d = dx*dx + dy*dy

        r = circle1_radius_padded + circle2.radius
        if d_d < (r * r - 0.01):
            dx = circle2.x - circle1_x
            dy = circle2.y - circle1_y
            d = sqrt(d_d)
            aux = (r - d) * 0.5
            vx = (dx / d) * aux
            vy = (dy / d) * aux
Note the variables moved out of the inner loop, the variables that represent attributes like x and y, the manual inlining of the squared distance, and the fact I've had to use a squared distance to gain speed high enough for this tiny real-time simulation.

See the whole program here, the n.2 inside the zip:
http://www.fantascienza.net/leonardo/js/circle_packing.zip

This is the same code written in a more pythonic way (in D you can write code like this about as efficient as the first one, but the D compiler isn't able to avoid computing some square roots), but this code is too much slow (with Psyco) for the purposes of the realtime simulation (this code is untested and it may contain bugs, it's just for show):
for i, c1 in enumerate(circles[:-1]) :
    for c2 in circles[i+1:]
        if c1.intersects(c2, 0.01):
            d = c1.distance(c2)
            r = c1.radius + c2.radius + padding
            dv = c1.center - c2.center # vector op
            v = (dv / d) * (r - d) * 0.5 # vector op
Or even (in Chapel language you may be able to write code like this about as efficient as the first version. Chapel compiler too is probably unable to avoid some square roots):
for c1, c2 in xuptriangle_pairs(circles):
    if c1.intersects(c2, 0.01):
        d = c1.distance(c2)
        r = c1.radius + c2.radius + padding
        dv = c1.center - c2.center # vector op
        v = (dv / d) * (r - d) * 0.5 # vector op

Why Pascal and Java aren't very good to represent algorithms:
- Pascal forces you to assign concrete types to arguments and variables, and this limits the flexibility a lot. Algorithms in a book (or an algorithm course) may enjoy being more flexible. For example for a 'summing' function I don't want the first argument to be an array of ten long integers. And I don't even want it to be a T[] (in D it's a dynamic array of T items). What I want is an iterable that can be scanned in a forward way (with foreach), and its items must have a zero and a plus (+) operator.
- In Java and C# using lot of interfaces you can probably do that. But for higher performance you may want to do such things with native data too (but a good enough compiler can solve this problem, and allow to use all kinds of data in an uniform way).
- D templates help to write generic functions, that work on native data too. But when using functional-style programming they require you to instantiate the functions to pass delegates. And at the moment in D it's not handy to assert such smart constraints to the input types (even if D2 adds "template constraints"). So in D you in summing function you may end up using T[] as the type of the first argument. This is at the same time too much tight (because a linked list like List!(T) may be good enough) and too litte tight (because T may not support a sum, see T===string.
- If used widely C++0x "concepts" seem a solution to such weakness of C++/D2, but I'd like them to have a simpler and cleaner syntax.
- Haskell Type Classes are probably enough to solve the problem I am talking about here, but I'd like to have a similar feature in a more traditional language that allows mutability and imperative programming too (maybe the Scala language allows this, I don't know yet).
comments: Leave a comment Previous Entry Share Next Entry

Algorithms, Pascal, Python, Java, D and more - leonardo
View:Recent Entries.
View:Archive.
View:Friends.
View:Profile.
View:Website (My Website).