Log in

No account? Create an account

View:Recent Entries.
View:Website (My Website).
You're looking at the latest 10 entries, after skipping 10 newer ones.
Missed some entries? Then simply jump back 10 entries or forward 10 entries

Subject:Graph Walk
Time:01:48 am
In the "Nine Algorithms That Changed the Future: The Ingenious Ideas That Drive Today's Computers" book by John MacCormick there is a simple simulation of a graph of Web pages, to implement a basic page ranking. This is modified from Figure 3-7 (from: http://press.princeton.edu/releases/m9528_illus.pdf ):

I have implemented the algorithm using the D language. The simulation is just a random walk on the graph, and at each step there is also a small probability (0.15) to jump to a random node. There are various ways to implement this walk, I have chosen a fast one:
// graph_walk1.d
import std.stdio, std.random, std.algorithm, std.math, std.typetuple;

alias NodeIndex = int;

enum nVisits = 20_000_000;
enum nUnroll = 8;
static assert(nVisits % nUnroll == 0);
enum restartProbability = 0.15;
enum nMaxOutArcs = 5;
enum nNodes = 16;

template Iota(int stop) {
    static if (stop <= 0)
        alias TypeTuple!() Iota;
        alias TypeTuple!(Iota!(stop - 1), stop - 1) Iota;

struct Node {
    immutable uint nOutArcs;
    immutable uint[nMaxOutArcs] outbound;
    uint hitCount = 0;

void main() {
    auto rng = Xorshift(1);
    Node[nNodes] graph = [{/* 0*/ 3, [13, 9, 8]},
                          {/* 1*/ 1, [2]},
                          {/* 2*/ 2, [0, 8]},
                          {/* 3*/ 1, [1]},
                          {/* 4*/ 2, [1, 2]},
                          {/* 5*/ 2, [2, 8]},
                          {/* 6*/ 3, [3, 4, 5]},
                          {/* 7*/ 1, [6]},
                          {/* 8*/ 1, [15]},
                          {/* 9*/ 5, [14, 12, 11, 10, 8]},
                          {/*10*/ 1, [15]},
                          {/*11*/ 1, [15]},
                          {/*12*/ 1, [15]},
                          {/*13*/ 1, [9]},
                          {/*14*/ 1, [15]},
                          {/*15*/ 1, [7]}];

    uint position = uniform(0, graph.length, rng);

    foreach (immutable _; 1 .. nVisits / nUnroll)
        foreach (immutable __; Iota!nUnroll) {
            if (graph[position].nOutArcs == 0 || uniform(0.0, 1.0, rng) < restartProbability)
                position = uniform(0, graph.length, rng);
                position = graph[position].outbound[uniform(0, graph[position].nOutArcs, rng)];

    foreach (immutable i, immutable node; graph)
        writefln("%2d %2.0f", i, round(100.0 * node.hitCount / cast(real)nVisits));

The program performs twenty millions jumps on the graph. The graph is represented with an adjacency array. I have used a fixed-size 2D matrix, that contains some holes because it has five columns, but only the node 9 has five outbound arcs. Each Node must contain the real number of outbound arcs, plus a counter of the visits.

There is a more compact representation for this graph, a linear array that has no holes and contains both the indexes of the outbound arcs and the number of outbound arcs, plus a little extra array that contains the start indexes of the nodes and the counts. But this representation is more complex, and I prefer a simpler code.

The Iota!nUnroll is used to unroll the loop 8 times, to speed up the walk a little.

To improve the performance of the code I have created a second version of the D program. It uses a similar data structure, but the counts are kept in a separated array.

This program performs the random walks jumping between the nodes using goto, so the state is kept in the program counter. The finite state machine is implemented with code generated at compile-time from the Graph data structure:

// graph_walk2.d
import std.stdio, std.random, std.algorithm, std.math;

alias NodeIndex = int;

enum nVisits = 20_000_000;
enum restartProbability = 0.15;
enum nMaxOutArcs = 5;
enum nNodes = 16;

struct Node {
    immutable uint nOutArcs;
    immutable int[nMaxOutArcs] outbound;

string genSwitchCode(in int nMax) /*pure*/ {
    string switchCode = "switch (position) {\n";
    foreach (immutable i; 0 .. nMax)
        switchCode ~= format("    case %d: goto NODE%d;\n", i, i);
    switchCode ~= "    default: assert(0);\n";
    switchCode ~= "}\n";
    return switchCode;

string genNodeCode(in int nMax, in string switchCode) /*pure*/ {
    string nodeCode;
    foreach (immutable i; 0 .. nMax)
        nodeCode ~= format("
            if (steps >= nVisits) goto END;
            if (uniform(0.0, 1.0, rng) < restartProbability)
                position = uniform(0, graph.length, rng);
                position = graph[%d].outbound[uniform(0, graph[%d].nOutArcs, rng)];
            ", i, i, i, i, switchCode);
    return nodeCode;

void main() {
    auto rng = Xorshift(1);

    static enum Node[nNodes] graph = [
        {/* 0*/ 3, [13, 9, 8]},
        {/* 1*/ 1, [2]},
        {/* 2*/ 2, [0, 8]},
        {/* 3*/ 1, [1]},
        {/* 4*/ 2, [1, 2]},
        {/* 5*/ 2, [2, 8]},
        {/* 6*/ 3, [3, 4, 5]},
        {/* 7*/ 1, [6]},
        {/* 8*/ 1, [15]},
        {/* 9*/ 5, [14, 12, 11, 10, 8]},
        {/*10*/ 1, [15]},
        {/*11*/ 1, [15]},
        {/*12*/ 1, [15]},
        {/*13*/ 1, [9]},
        {/*14*/ 1, [15]},
        {/*15*/ 1, [7]}];
    uint[graph.length] hits;
    uint steps = 0;

    uint position = uniform(0, graph.length, rng);

    enum switchCode = genSwitchCode(graph.length);
    enum nodeCode = genNodeCode(graph.length, switchCode);

    foreach (immutable i, immutable h; hits)
        writefln("%2d %2.0f", i, round(100.0 * h / cast(real)nVisits));

genSwitchCode() generates a simple jump table like this:

switch (position) {
    case 0: goto NODE0;
    case 1: goto NODE1;
    case 2: goto NODE2;
    case 3: goto NODE3;
    case 4: goto NODE4;
    case 5: goto NODE5;
    case 6: goto NODE6;
    case 7: goto NODE7;
    case 8: goto NODE8;
    case 9: goto NODE9;
    case 10: goto NODE10;
    case 11: goto NODE11;
    case 12: goto NODE12;
    case 13: goto NODE13;
    case 14: goto NODE14;
    case 15: goto NODE15;
    default: assert(0);

While genNodeCode() generates the whole finite state machine, each node looks like this:

if (steps >= nVisits) goto END;
if (uniform(0.0, 1.0, rng) < restartProbability)
    position = uniform(0, graph.length, rng);
    position = graph[0].outbound[uniform(0, graph[0].nOutArcs, rng)];
switch (position) {
    case 0: goto NODE0;
    case 1: goto NODE1;
    case 2: goto NODE2;
    case 3: goto NODE3;
    case 4: goto NODE4;
    case 5: goto NODE5;
    case 6: goto NODE6;
    case 7: goto NODE7;
    case 8: goto NODE8;
    case 9: goto NODE9;
    case 10: goto NODE10;
    case 11: goto NODE11;
    case 12: goto NODE12;
    case 13: goto NODE13;
    case 14: goto NODE14;
    case 15: goto NODE15;
    default: assert(0);

The LLVM back-end of the LDC2 compiler seems able to recognize this kind of jumps tables, and replaces them with a goto (like an jmp *%eax instruction), so this machine is quick. Most of the time is spent generating random numbers.

To see if the computed gotos of the GNU C of GCC (http://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Labels-as-Values.html ) lead to a better performance, I have translated the D code to GNU-C99:

// graph_walk3.c
#include "stdio.h"
#include "stdlib.h"
#include "time.h"

#define nVisits 20000000
#define restartProbability 0.15
#define nMaxOutArcs 5
#define nNodes 16

inline static double uniform01() {
    return rand() / (double)RAND_MAX;

inline static int uniform(const int max) {
    int k = (int)(max * (rand() / (double)RAND_MAX));
    return (k == max) ? k - 1 : k;

typedef struct {
    unsigned int nOutArcs;
    void *outbound[nMaxOutArcs];
} Node;

#define NODE_BLOCK(N)                                                  \
    do {                                                               \
        NODE##N:                                                       \
            hits[N]++;                                                 \
            steps++;                                                   \
            if (__builtin_expect(steps >= nVisits, 0)) goto END;       \
            if (__builtin_expect(uniform01() < restartProbability, 0)) \
                goto *NodeAddresses[uniform(nNodes)];                  \
            else                                                       \
                goto *graph[N].outbound[uniform(graph[N].nOutArcs)];   \
    } while (0)

int main() {
    static const void *NodeAddresses[] = {
        &&NODE0,  &&NODE1,  &&NODE2,  &&NODE3,
        &&NODE4,  &&NODE5,  &&NODE6,  &&NODE7,
        &&NODE8,  &&NODE9,  &&NODE10, &&NODE11,
        &&NODE12, &&NODE13, &&NODE14, &&NODE15};

    static const Node graph[nNodes] = {
        /* 0*/ {3, {&&NODE13, &&NODE9, &&NODE8}},
        /* 1*/ {1, {&&NODE2}},
        /* 2*/ {2, {&&NODE0, &&NODE8}},
        /* 3*/ {1, {&&NODE1}},
        /* 4*/ {2, {&&NODE1, &&NODE2}},
        /* 5*/ {2, {&&NODE2, &&NODE8}},
        /* 6*/ {3, {&&NODE3, &&NODE4, &&NODE5}},
        /* 7*/ {1, {&&NODE6}},
        /* 8*/ {1, {&&NODE15}},
        /* 9*/ {5, {&&NODE14, &&NODE12, &&NODE11, &&NODE10, &&NODE8}},
        /*10*/ {1, {&&NODE15}},
        /*11*/ {1, {&&NODE15}},
        /*12*/ {1, {&&NODE15}},
        /*13*/ {1, {&&NODE9}},
        /*14*/ {1, {&&NODE15}},
        /*15*/ {1, {&&NODE7}}};

    unsigned int hits[nNodes] = {0};
    unsigned int steps = 0;
    goto *NodeAddresses[uniform(nNodes)];


    for (int i = 0; i < nNodes; i++)
        printf("%2d %2.1f\n", i, 100.0 * hits[i] / (double)nVisits);

    return 0;

I have used C macros to keep the C code short about as the D code.

The "do { } while (0)" is a useful trick to require a semicolon to the macro calls.

The "NODE##N:" syntax allows to generate the numered jump labels from the NODE_BLOCK() macro argument N.

Now there is no need of jump tables, there are just gotos to the addresses. I think this is one of the fastest ways to implement a hard-coded finite state machine.

GCC 4.8.0 generates code like this for each node:

    addl $1, %ebx
    addl $1, 20(%esp)
    cmpl $19999999, %ebx
    ja L15
    call _rand
    movsd LC1, %xmm1
    cvtsi2sd %eax, %xmm0
    mulsd LC0, %xmm0
    comisd %xmm0, %xmm1
    ja L125
    call _rand
    cvtsi2sd %eax, %xmm0
    xorl %eax, %eax
    mulsd LC0, %xmm0
    cvttsd2si %xmm0, %edx
    cmpl $1, %edx
    cmovne %edx, %eax
    movl _graph.4128+28(,%eax,4), %eax
    jmp *%eax

While the Intel compiler ICC 13.0.1 splits the code in a fast zone and a slow zone, I have not timed this but I think this could be a little more efficient:
    incl      %ebx
    incl      40(%rsp)
    cmpl      $20000000, %ebx
    jae       ..B1.108      # Prob 4%
    call      uniform01()
    comisd    .L_2il0floatpacket.20(%rip), %xmm0
    jbe       ..B1.139      # Prob 18%
    movl      $1, %edi
    call      uniform(int)
    movslq    %eax, %rax
    movq      488+graph.351.0.3(,%rax,8), %rax
    jmp       *%rax

    movl      $16, %edi
    call      uniform(int)
    movslq    %eax, %rax
    movq      NodeAddresses.351.0.3(,%rax,8), %rax
    jmp       *%rax

Run-time per nVisits = 20_000_000:
  graph_walk1: 1.35 (nUnroll = 8)
  graph_walk2: 1.14
  graph_walk3: 1.08

Compilation arguments used:
-Wall -Wextra -std=c99 -mfpmath=sse -march=native -Ofast -flto -s graph_walk3.c -o graph_walk3
LDC2 (ldmd2):
-O -release -inline -noboundscheck

GCC V.4.8.0
LDC2 V.0.12.1 based on DMD v2.063.2 and LLVM 3.3.1.

The code shown in this post:
comments: Leave a comment

Subject:Coin game probabilities
Time:01:53 pm
A simple programming problem, here solved in Matlab:

There is a table composed of four squares in a circle. When you are on the last square and you move one step forward, you go back to the first square.

Starting at the first square, flip two coins at each move. If the coins are:
- Head Head: move one square forward;
- Tail Tail: move two squares forward;
- Head Tail or Tail Head: go to the first square.

With such rules, what is the probability of being on each of the four squares after twenty moves?

A first simple way to find the probabilities numerically is by simulation, in D language:

import std.stdio, std.random, std.algorithm, std.traits,
       std.bigint, std.exception;

enum uint nSquares = 4;

uint move(uint pos, ref Xorshift rng) {
    final switch (rng.front % nSquares) {
        case 0, 1: // Move one square forward.
            pos = (pos + 1) % nSquares;
        case 2: // Move two squares forward.
            pos = (pos + 2) % nSquares;
        case 3: // Go to the first square.
            pos = 0;
    return pos;

void main() {
    enum nTurns = 20,
         nTries = 100_000_000;

    auto rng = Xorshift(unpredictableSeed);
    enforce(isUnsigned!(typeof(rng.front)) &&
            (rng.front.max.BigInt + 1) % nSquares == 0,
            "move() needs to use std.random.uniform().");

    uint[nSquares] finalPosCounts;
    foreach (immutable _; 0 .. nTries) {
        uint pos = 0; // Starting position.
        foreach (immutable __; 0 .. nTurns)
            pos = move(pos, rng);


    finalPosCounts[].map!(x => x / cast(real)nTries).writeln;

I have used rng.front % nSquares instead of std.random.uniform() because compiled with the LDC2 compiler it's almost twice faster.

If I compile that first program with:
ldmd2 -O -release -inline -noboundscheck

LDC2 produces the following good 32 bit asm from the two loops of the main():

    xorl    %esi, %esi
    xorl    %ecx, %ecx
    .align  16, 0x90
    movl    %edi, %edx
    movl    %esi, %edi
    movl    %edx, %ebx
    andl    $3, %ebx
    xorl    %esi, %esi
    cmpl    $3, %ebx
    je  LBB1_7
    cmpl    $2, %ebx
    je  LBB1_13
    cmpl    $1, %ebx
    ja  LBB1_14
    incl    %edi
    jmp LBB1_6
    .align  16, 0x90
    addl    $2, %edi
    andl    $3, %edi
    movl    %edi, %esi
    movl    48(%esp), %ebx
    movl    52(%esp), %edi
    movl    %edi, 48(%esp)
    movl    56(%esp), %edi
    movl    %edi, 52(%esp)
    movl    %ebx, %edi
    shll    $11, %edi
    xorl    %ebx, %edi
    movl    %edx, 56(%esp)
    movl    %edx, %ebx
    shrl    $19, %ebx
    xorl    %edx, %ebx
    xorl    %edi, %ebx
    shrl    $8, %edi
    xorl    %ebx, %edi
    movl    %edi, 60(%esp)
    incl    %ecx
    cmpl    $20, %ecx
    jl  LBB1_2
    incl    32(%esp,%esi,4)
    incl    %eax
    cmpl    $100000000, %eax
    jl  LBB1_1

A different algorithm, as explained in the linked article is to power the transition probability matrix (here the matrix multiplication is implemented in a slow high level way):

import std.stdio, std.range, std.array, std.numeric, std.algorithm;

T[][] mul(T)(in T[][] A, in T[][] B) {
    const Bt = B[0].length.iota.map!(i=> B.transversal(i).array).array;
    return A.map!(a => Bt.map!(b => a.dotProduct(b)).array).array;

void main() {
    real[][] M = [[1, 2, 1, 0], [1, 0, 2, 1], [2, 0, 0, 2], [3, 1, 0, 0]];
    foreach (row; M)
        row[] *= 0.25;
    auto P = M.mul(M);
    foreach (immutable _; 2 .. 20)
        P = P.mul(M);


One run of the simulation (nTries = 100_000_000, runtime less than 21 seconds) gives:
[0.386108, 0.234545, 0.213841, 0.165506]

The second program gives a similar output:
[0.386203, 0.234484, 0.213797, 0.165516]

Running the first program with (ulong nTries = 10_000_000_000) in less than an hour gives:
[0.386202, 0.234477, 0.213796, 0.165525]

The code:
comments: Leave a comment

Subject:Better C++/D code
Time:03:50 pm
When I read the "The Little Schemer" I have done countless small coding exercises using recursion:

But later when I've started learning Haskell I've seen than explicit recursion is uncommon in good Haskell code. Most times in Haskell you use map, folds, and other higher order functions instead of recursion. (Sometimes in Haskell you use explicit recursion as an optimization).

C and C++ code contains lot of 'for' and 'while' loops, that are sometimes hard to read and understand.

Recently they have released videos and slides of the talks of the GoingNative 2013 conference. Among the interesting talks the one I have appreciated more is "C++ Seasoning" by Sean Parent (and from the comments I see that others agree with me, and even ask the speaker to write a book on the topic):

I suggest to start reading the updated slides:

The talk gives some suggestions for C++ code:
- No Raw Loops
- No Raw Syntonization Primitives
- No Raw Pointers

The first part suggests to reduce the number of loops in C++ code and replace them with algorithms (often with STL algorithms), a little like it's done in Haskell.

C++ is an abstraction over C, and the heavy usage of STL-like algorithms is kind of a new language that abstracts over raw loop-heavy C++ code. Writing code like that in C++ takes time and some brain power, but the resulting code is good, short, simpler to keep bug-free, and often efficient too.

In D language I suggest to use std.algorithm for similar purposes. I also suggest to make const/immutable all variables, unless they need to mutate, or unless something in D or in Phobos makes that impossible (like Ranges, that often can't be const).

Walter Bright calls "component programming" a variant of that coding style that relies on ranges a lot. I have shown several examples of such linear flow programming in past blog posts (that today is becoming very common, LINQ in C#, in Java8, with Python itertools, in F#, and so on).

Here is another simple example, the Euler Problem number 50 (http://projecteuler.net/problem=50 ), the problem:
The prime 41, can be written as the sum of six consecutive primes: 41 = 2 + 3 + 5 + 7 + 11 + 13
This is the longest sum of consecutive primes that adds to a prime below one-hundred.
The longest sum of consecutive primes below one-thousand that adds to a prime, contains 21 terms, and is equal to 953.
Which prime, below one-million, can be written as the sum of the most consecutive primes?
This is a simple imperative solution in D (modified and ported from Python code found here: http://blog.dreamshire.com/2009/04/12/project-euler-problem-50-solution/ ):
import std.stdio, std.algorithm, std.range, std.functional;

int euler50(in int top) {
    /*const*/ auto primes = top.sieve.assumeSorted;

    int[] prime_sum = [0];
    int tot = 0, count = 0;
    while (tot < top) {
        tot += primes[count];
        prime_sum ~= tot;

    int terms = 1;
    int max_prime = -1;
    foreach (immutable i; 0 .. count)
        foreach (immutable j; i + terms .. count) {
            immutable n = prime_sum[j] - prime_sum[i];
            if (j - i > terms && primes.contains(n)) {
                terms = j - i;
                max_prime = n;
    return max_prime;

void main() {
    1_000_000.euler50.writeln; // 997651
The idea is to compute a cumulative of the prime numbers, and then take a subset of all the N^2 subintervals, removing the sum of the starting end of the interval from the sum of its end, to compute in O(1) the sum of its items. The contains() function is a linear search, so this code is kind of O(n^3), but for the given problem it's fast enough, finding the correct result of 997651 in less than half second.

A very simple sieve implementation used in that euler50() function could be:

uint[] sieve(in uint limit) {
    if (limit < 2)
        return [];
    auto composite = new bool[limit];

    foreach (immutable uint n; 2 .. cast(uint)(limit ^^ 0.5) + 1)
        if (!composite[n])
            for (uint k = n * n; k < limit; k += n)
                composite[k] = true;

    return iota(2, limit).filter!(i => !composite[i]).array;

If you take a look at the euler50() functions you see general patterns. The cumulative of an array could be written as general function (even better is to write this as a lazy range that works on any input range):
T[] accumulate(T)(in T[] items) pure nothrow {
    auto result = new T[items.length];
    if (!items.empty) {
        result[0] = items[0];
        foreach (immutable i; 1 .. items.length)
            result[i] = result[i - 1] + items[i];
    return result;

The iteration on all subsets could be done with a handy "Pairwise" range (imports not included):
struct Pairwise(Range) {
    alias R = Unqual!Range;
    alias Pair = Tuple!(ForeachType!R, "a", ForeachType!R, "b");
    R _input;
    size_t i, j;

    this(Range r_) {
        this._input = r_;
        j = 1;

    @property bool empty() {
        return j >= _input.length;

    @property Pair front() {
        return Pair(_input[i], _input[j]);

    void popFront() {
        if (j >= _input.length - 1) {
            j = i + 1;
        } else {

Pairwise!Range pairwise(Range)(Range r)
if (isRandomAccessRange!Range && hasLength!Range && !isNarrowString!Range) {
    return typeof(return)(r);

Then what's left is a search among all the subsequences, throwing away all the ones that are too much long and the ones that don't sum to a prime number. Also the pre-computed array of primes is sorted, so it's better to use a binary search. Now we can organize all the computation in few simple chains:
int euler50b(in int top) {
    auto primes = top.sieve.assumeSorted;
    auto primeSum = primes.release.accumulate;

    return primeSum
           .until!(p => p > top)
           .map!(ij => tuple(ij[1] - ij[0], primeSum[ij[1]] - primeSum[ij[0]]))
           .filter!(dn => primes.contains(dn[1]))
This code works and runs in something like 0.3-0.4 seconds, so it's still fast enough.

This euler50b() function has several problems:
- A range like pairwise() is not yet present in std.range;
- Something like accumulate() (with a given reducing function) is missing in std.algoritm;
- In D there is no syntax to destructure (unpack) tuples, so you have to use things like: ij => tuple(ij[1] - ij[0] ...
- std.algorithm.max is not yet accepting a key function.
- Currently "primes" can't be immutable or even const because you can't call the "release" method on a const SortedRange;
- Currently "primeSum" can't be immutable otherwise reduce() doesn't work.
- Currently "iota" is not pure nor nothrow.

How the main function could look, once D and Phobos become more refined:
int euler50c(in int top) pure nothrow {
    immutable primes = top.sieve.assumeSorted;
    immutable primeSum = primes.accumulate!q{a + b};

    return primeSum
           .until!(p => p > top)
           .map!({i, j} => tuple(j - i, primeSum[j] - primeSum[i]))
           .filter!({d, n} => primes.contains(n))
           .reduce!(max!({d, n} => d))[1];

The full code of the two versions:
comments: 4 comments or Leave a comment

Subject:A little Morse Code decoder in D
Time:11:19 pm
A little Morse Code decoder in D language, adapted from here:
void main() {
    import std.stdio, std.string, std.array, std.algorithm, std.conv;

    "-- --- .-. ... .    -.-. --- -.. ."
    .split("    ")
    .map!(w => w
               .map!(s => " ETIANMSURWDKGOHVF L PJBXCYZQ  "
                          [reduce!q{2 * a + 1 + (b & 1)}(0, s)])
    .join(" ")
It prints: MORSE CODE
comments: Leave a comment

Subject:Updates and links
Time:10:45 pm
Lot of time since the last update.


Solutions of the first three problems of the qualifications of the Google Code Jam 2013:

A D solution for the Euler problem n.395, translated from Python:

Updated code in the zip file:

Updated and improved (now "foo-bar" is digested as "foo bar" instead of "foobar"):

Added the "Spectral-norm" benchmark using SIMD:

Updated path tracing demo:

Added a De Bruijn sequence generator in D:

Added two photos of a peluche of the Pegasus pony Bluespark:


An nice example of not so good functional programming, and a better solution:

A talk video, Simon Peyton Jones on Adventures with types, nice and interesting:
The slides:
All the slides of the interesting LASER 2013 conference, including more slides from Simon Peyton:

Many interesting slides from the CPPNOW 2013 conference:

A simple optimization story:

Some things to know about tin:


A simple intro to finite state machines for games:

Less Wrong's / Eliezer Yudkowsky's Harry Potter fanfic-in-progress, "Harry Potter and the Methods of Rationality", it's a very long and very good fanfiction novel that introduces ways to think rationally:

"Flutterwonder", a 3D musical video about the little pony Fluttershy:
comments: Leave a comment

Subject:A word counting problem
Time:04:31 pm
A little programming problem from this blog post:

The problem is:
Read a file of text, determine the n most frequently used words, and print out a sorted list of those words along with their frequencies.
A solution with shell scripting:
Here's the script that does that, with each command given its own line:
1:  tr -cs A-Za-z '\n' |
2:  tr A-Z a-z |
3:  sort |
4:  uniq -c |
5:  sort -rn |
6:  head -${1}
1) Make one-word lines by transliterating the complement (-c) of the alphabet into newlines (note the quoted newline), and squeezing out (-s) multiple newlines.
2) Transliterate upper case to lower case.
3) Sort to bring identical words together.
4) Replace each run of duplicate words with a single representative and include a count (-c).
5) Sort in reverse (-r) numeric (-n) order.
6) Pass through a stream editor; quit (q) after printing the number of lines designated by the script's first parameter (${1}).

Even if data-filtering programs for these exact purposes were not at hand, these operations would well be implemented separately: for separation of concerns, for easier development, for piecewise debugging, and for potential reuse. The simple pipeline given above will suffice to get answers right now, not next week or next month. It could well be enough to finish the job.
Some comments to that blog post:
Andre is quite right that this is an example of the elegance of FP vs. imperative -- but the flip side is that FP often fails when it runs up against real-world edge cases. Knuth's version may not handle contractions -- but it could, easily. Could McIlroy's? What if I want British/US spellings (such as "color" and "colour") to be counted as one word? What about hyphenated words?
20+ years later, we still don't have sed or awk available as a library, afaik. We also don't have a tr-like function which works on streams, or a sort for text files in standard libraries. Which is why you simply can't use it when you write something that isn't completely solvable via shell scripting - it doesn't get compiled in, and your team leader/senior programmer/whatever will definitely not allow you to call shell commands from C, C++ or Pascal, even when it pays off.
This is a modern solution (from the blog post http://franklinchen.com/blog/2011/12/08/revisiting-knuth-and-mcilroys-word-count-programs/ with several small changes, probably it can be improved further):
import System.Environment (getArgs)
import Data.List (sortBy)
import Data.Char (toLower, isLetter)
import qualified Data.HashMap.Strict as HM

createReport :: Int -> String -> String
createReport n text =
    unlines $
    map (\(count, w) -> w ++ " " ++ show count) $
    take n $
    sortBy (flip compare) $
    map (\(w, count) -> (count, w)) $
    HM.toList $
    HM.fromListWith (+) $
    map (\w -> (w, 1)) $
    words $
    map toLower $
    map (\c -> if isLetter c then c else '\n') $

main = do
    [fileName, nstr] <- getArgs
    let n = read nstr
    text <- readFile fileName
    putStr $ createReport n text
Usage: runhaskell wfreqh.hs test_data.txt 10

A comment from that blog post:
The shell script operates on raw text and everything is just strings being parsed and reparsed by the respective Unix utility programs.

The Haskell program is statically typed. It is type-checked by the compiler, which generates native code. The program uses standard libraries and data types, such as lists and hash maps.

Also, the Haskell program could be refined, extended, optimized in various ways. The most important optimizations I can think of off the top of my head:
- Using a better representation of strings than the default built-in "string as list of characters". Easily accessible advice can be found on Stack Overflow and browsing through Haskell library documentation, such as for the text package.
- Loop fusion, deforestation can be applied to deal with the apparent allocation of lots of new lists in the pipeline. One of the selling points of using a language like Haskell is the opportunity for the compiler to perform radical optimizations that are impossible for languages that have side effects.

I don't write many bash scripts these days. General-purpose programming languages can do a decent job munging data without difficulty. The situation was different decades ago when there was C, and few standard high-level libraries for the C world.
This is a D language version (it requires no installation of libraries like the unordered-containers from Cabal in the Haskell program). It works with unicode text. If you only have ASCII text the code becomes simpler. That Haskell version seems to have problems with unicode text (on the other hand don't expect the D code to perform a fully correct handling of unicode sorting, that is a complex problem).
import std.stdio, std.conv, std.file, std.string,
       std.algorithm, std.range;

void main(string[] args) {
    .tr("A-Za-z", "\n", "cs")
    .schwartzSort!q{ -a[1] }
    .map!q{ text(a[0], " ", a[1]) }
D version: 17 nonblank lines, Haskell: 23 lines.
(The cast is needed because of small D bug that probably will be fixed. The sort function has () to avoid calling the built-in sort, that will be deprecated.)
File sizes:
9.005.142 test_data.txt
      365 wfreqd.d
      701 wfreqh.hs
1.565.710 wfreqh.exe (stripped)
  510.492 wfreqd.exe
Haskell code compiled with:
ghc -O3 wfreqh.hs
Glasgow Haskell Compiler, Version 7.6.1, stage 2 booted by GHC version 7.4.2

D code compiled with:
dmd -O -release -inline -noboundscheck wfreqd.d
DMD Version 2.064alpha.
Runtime, best of 3 (top 100 words):
  wfreqh: 7.45
  wfreqd: 6.24 
Tests performed on Windows 32 bit, using an about 9 MB text file in English.

In theory the shell program is fully lazy (even the sort can be done with an external sorting algorithm), while the D (and probably the Haskell) version load the whole file in memory.

The D and Haskell programs are strongly statically typed so they are able to catch several programming errors at compile-time.

The shell program uses text, so it has to convert numbers to text and then from text to numbers, while the D and Haskell programs use lazy sequences of arbitrary types, that can be more powerful and save time in the textual serialization and deserialization. And unless the floating point values are serialized as hex floats, such textual conversions can also introduce some error.

If you have to perform some operation not present among the unix tools, you can write a small program (in C or another language) that uses text pipes, and call it like the other shell commands. In Haskell or D you just write a function (or a lazy Range in D, often a struct), and call it, this is simpler and more integrated, but it's less modular.

Update Jun 21 2013: wiz offers this Haskell version (gist.github.com/wiz/5826131 ):
{-# LANGUAGE TupleSections #-}

import qualified Data.Text.Lazy as TL
import qualified Data.Text.Lazy.IO as TL
import           Data.Char (isLetter)
import           Data.List (sortBy)
import           Data.Function (on)
import qualified Data.HashMap.Strict as HM
import System.Environment (getArgs)

collect = sortBy (flip compare `on` snd)
        . HM.toList
        . HM.fromListWith (+)
        . map (,1)
        . filter (not . TL.null)
        . TL.split (not . isLetter)
        . TL.toLower

display (word, count) = do
    TL.putStr word
    putStr ": "
    print count

main = do
    [fname, count] <- getArgs
    text <- TL.readFile fname
    mapM_ display $ take (read count) (collect text)

This Haskell version is slower than the precedent one, it takes about 10.8 seconds to run. The D code compiled with the ldc2 compiler (that optimizes better than dmd) takes 4.5 seconds to run (if you use ldc2 2.063 then use .sort!q{-a[1] < -b[1]} instead of the schwartzSort, that was improved later in Phobos).

There are many ways to improve the performance of the D code. Like using an hash counter, that is missing in Phobos (run-time with ldc2 is about 3.17 seconds):
import std.stdio, std.conv, std.file, std.string,
       std.algorithm, std.range, std.traits;

auto hashCounter(R)(R items) if (isForwardRange!R) {
    size_t[ForeachType!R] result;
    foreach (x; items)
    return result.byKey.zip(result.byValue);

void main(string[] args) {
    .tr("A-Za-z", "\n", "cs")
    .sort!q{ -a[1] < -b[1] }
    .map!q{ text(a[0], " ", a[1]) }
comments: 31 comments or Leave a comment

Subject:Links and updates
Time:07:15 pm
A simple nice idea to organize a functional program in Clojure:

A quite nice sequence of programming puzzles:
And I have added some D code in my site about the Challenge 41:

Intro to atomic operations in C++:

About C++ axioms:

An usage example of Liquid Haskell:

Basic info about iteration (part 3 is coming):
comments: Leave a comment

Time:08:11 pm
This time mostly some recent good coding & computer science links.

Catalan numbers pop out in many different situations, so for a programmer it's useful to know them. This explanation is nice and very simple:

Rustan Leino, Microsoft Research - "Program Verification: Yesterday, Today, Tomorrow":

More on verification:

"The Checker Framework: Custom pluggable types for Java", seems nice, but also annotation-heavy:

"Go at Google: Language Design in the Service of Software Engineering", about some of the design principles of the Go language:

Facebook NYC Tech Talk - Jordan DeLong "using cxx::types", using a stronger typing is useful in C++:

"The algebra of algebraic data types" by Chris Taylor, talk at London HUG, a little mind-bending but good:
Video: https://www.youtube.com/watch?&v=YScIPA8RbVE
Slides: https://github.com/chris-taylor/LondonHUG/archive/master.zip

About the Rust language, but it's too much short, it doesn't explain lot of things:

Algebra is useful again and again:

Simple article about external and internal iteration:
comments: Leave a comment

Tags:, , , , ,
Subject:Static typing, contracts and unit tests
Time:12:54 am
I have moved the article to my site:

comments: 8 comments or Leave a comment

Subject:Updates and links
Time:02:15 am

Anatomy and physiology of a new species I've invented:

Many implementations in D language of the Rosettacode site:

Hungarian algorithm efficiently implemented in D (adapted from Java code):

I have updated the Two towers problem page, thanks to a kind reader:



GCC explorer, an online C++ compiler that shows the asm:
And even for D language (GDC compiler):

"The Design of LLVM", by Chris Lattner. Nice and very easy to read:

Lua JIT2 design:

Fun talk about reading code:

Two articles on functional programming in general:

"Damn Cool Algorithms: Cardinality Estimation":

An interesting site, similar to the Project Euler site, but with bioinformatics problems:

Your language is bad because... (currently no D entry):
comments: Leave a comment

View:Recent Entries.
View:Website (My Website).
You're looking at the latest 10 entries, after skipping 10 newer ones.
Missed some entries? Then simply jump back 10 entries or forward 10 entries