leonardo - Permutations exercise in D (and C)
View:Recent Entries.
View:Website (My Website).

Subject:Permutations exercise in D (and C)
Time:10:48 pm
By leonardo maffi, V.1.0, 1 Aug 2014.

A nice programming exercise in Haskell by Harry Garrood, "Permutations: an exercise" (21 Jul 2014):

Harry Garrood writes:
We're going to use Haskell, because this is all about functions (in the mathematical sense), and so Haskell, being a functional programming language, is especially well suited to the job.
Most of the code you below is in D language (with a C version at the end) because it's a good language to solve similar problems.

The problem statement defines a card shuffling technique:
1) Take one card from the top of the deck and discard it into a second pile.
2) Take another card from the top of the deck, and put it at the bottom of the deck.
3) Repeat these two steps, putting all discarded cards from step 1 into the same pile, until the original deck is all gone and the second pile has all the cards in it.

The problem is: how many shuffles does it take until a deck is in the same order as when you started, for a deck with an arbitrary number of cards? Write a function, f :: Int -> Int, such that, for a deck with n cards, f n is the minimum number of shuffles required to return it to its original order.

Below I explain the varios D implementations, that you can find in the archive:

The fantascienza site is down, so you can temporarily find the programs here:

perms1.hs: http://lpaste.net/2412259684489625600
perms2.hs: http://lpaste.net/390308567523000320
perms3.d: http://dpaste.dzfl.pl/2eb987664dfd
perms3b.d: http://dpaste.dzfl.pl/0aa48d395cb2
perms4.d: http://dpaste.dzfl.pl/0b0cb344ccce
perms5.d: http://dpaste.dzfl.pl/9f929c2d064d
perms6.d: http://dpaste.dzfl.pl/dbf1d3f19321
perms7.d: http://dpaste.dzfl.pl/a8eddf54067d
perms7b.d: http://dpaste.dzfl.pl/8a9a1afc10f8
perms7c.d: http://dpaste.dzfl.pl/45ff250cafa7
perms8.d: http://dpaste.dzfl.pl/32614ae6430c
perms9.c: http://dpaste.dzfl.pl/608dd06e44a5

I compile all the code on a 32 bit Windows using:
dmd -O -release -inline -noboundscheck
ghc -O3
gcc -Ofast -flto -std=c99 -Wall -Wextra

dmd 2.066beta4
Glasgow Haskell Compiler, Version 7.6.1, stage 2 booted by GHC version 7.4.2
gcc 4.0.0
- - - - - - - - - - - - - - - - - - -


The perms1.hs and perms2.hs programs are adapted from the two solutions by Harry Garrood, with small changes.

This is the original 'permutations' function:
permutation :: (Deck -> Deck) -> Int -> PermutationGraph
permutation f n = go 1 xs
    xs = unCardAll . f . makeDeck $ n
    go m (y:ys) = (y, m) : go (m+1) ys
    go _ []     = []

This is my version:
permutation :: (Deck -> Deck) -> Int -> PermutationGraph
permutation shuf n = zip perm [1 ..]
    where perm = unCardAll $ shuf $ makeDeck n
perms2.hs runs on the six test cases (n = 4, 5, 52, 53, 100, 200) in about 0.08 seconds (such small timings are not precise).

- - - - - - - - - - - - - - - - - - -


This program is equivalent to the perms1.hs program.
Harry Garrood defines a very nice and clean Haskell function that performs the two first steps of the shuffling:
step :: (Deck, Deck) -> (Deck, Deck)
step (deck, pile) = case deck of
    a:b:cs -> (cs ++ [b], a : pile)
    [a]    -> ([], a : pile)
    []     -> ([], pile)

I have implemented this 'step' function in various ways in D, and at the end this is the version I have used (here I have removed the pre/post-conditions, see the files for the full code), this performs the whole shuffling (including the third step):
Deck deckShuffle(in Deck Ain) pure nothrow @safe @nogc {
    typeof(return) B;
    const(Card)[] A = Ain;

    while (!A.empty) {
        // Step 1:
        B = [Card(A.front)] ~ B;

        // Step 2:
        if (!A.empty) {
            A ~= A.front;

    return B;

This function is (strongly) pure, it accepts a constant array of Cards and performs the shuffling using slices and concatenations, building a new output array B. Here a Deck is a dynamic array, but perhaps for this algorithm a more efficient data structure for a Deck is a (singly linked) list (about as in the Haskell code).

In perms3 I have a commented-out the range-based version of 'makeDeck' because currently the 'array' function is not @safe for that use case. So I've written a more internally imperative version of 'makeDeck':
enum makeDeck = (in uint n) pure nothrow => iota(1, n + 1).map!Card.array;

I have used a Typedef to implement the 'newtype' of the Haskell code (despite currently Typedef has some bugs):
alias Card = Typedef!int;
The function 'f1' uses a do-while because Phobos still lacks an 'iterate' range-based function, as used in the 'shuffle' Haskell function.

I have run both perms1.hs and perms3.d on the few testing values, both print:
4 2
5 5
52 510
53 53
100 120
200 8460

perms1.hs produces this output in about 2.72 seconds:

perms3.d produces this output in (best of 3) in about 1.39 seconds:
4 2
5 5
52 510
53 53
100 120
200 8460
I have then created a perms3b.d program that is very similar to perms3.d, but represents a Deck with a SList (a singly linked list) hoping to improve performance avoiding the O(n) copies of the slices of the array-based version. But perms3b.d runs in about 30.4 seconds. In most cases linked lists don't give any performance advantage.

- - - - - - - - - - - - - - - - - - -


This version is similar to perms3.d, but here I have implemented 'deckShuffle' in a lower level way. This deckShuffle is now @nogc (it doesn't allocate on the heap), and works in-place on two given arrays, using indexes (it could also be implemented using pointer arithmetic, but there's no need for that, this function is fast and @safe). Now deckShuffle is still annotated with 'pure' but now it's weakly pure:
void deckShuffle(Deck A, Deck B) pure nothrow @safe @nogc {
    size_t aTop = 0,
           aLength = A.length,
           bTop = B.length - 1; // B is empty at start.

    while (aLength) {
        // Step 1:
        B[bTop] = A[aTop];
        aTop = (aTop + 1) % A.length;

        // Step 2:
        if (aLength) {
            A[(aTop + aLength) % A.length] = A[aTop];
            aTop = (aTop + 1) % A.length;
This kind of code is more fiddly and requires more care and testing compared to more a functional style, but it's also much faster and produces no heap garbage. The run-time of this function is also more predictable.

The run time of 'perms4.d' to produce the six output pairs is about 0.06 seconds, so this is about 23 times faster than 'perms3.d'. Unlike many other languages, in D it's easy to mix low level code with higher level code, and produce code that still looks natural and free of most accidental complexity.

- - - - - - - - - - - - - - - - - - -


This implements the algorithm from perms2.hs. For simplicity I have used the slower but simpler 'deckShuffle' from perms3.d.

Instead of writing the various functions of the Haskell version, building the various intermediate data stutures (like the list of pairs), I have written a single 'f2' function that computes the lcm of the cycleLen without storing the actual cycles. D has less 'glue' than Haskell, so often in D it's more convenient to write a little larger functions (with the side effect that they are often faster than the Haskell versions, but also less easy to combine in different ways):
ulong f2(in uint n) pure nothrow {
    uint[uint] graph;
    foreach (immutable uint i, immutable x; n.makeDeck.deckShuffle)
        graph[i + 1] = cast(uint)x;

    typeof(return) result = 1;

    while (graph.length) {
        auto x = graph.byKey.front;
        uint cycleLen = 0;
        while (x in graph) {
            immutable y = graph[x];
            x = y;
        result = lcm(result, cycleLen);

    return result;
This code is less general and more specialized than the Haskell functions. It's less easy to see what the code is doing. Here 'graph' of the pairs is an associative array to avoid the linear searches in the lists (but a linear scan is still present despite being hidden, the one performed by byKey.front).

The run-time of perms5.d is very small, so beside calling 'f2' on the six test cases, this function also computes 'f2' on the numbers n in [1, 1_000[. The total run-time is about 1.29 seconds.

If I modify perms2.hs to compute f2 in the same [1, 1_000[ interval, the run-time is very large (about half an hour, and it uses 32 bit integers, so some results for the [1, 1_000[ interval are wrong). So the 'f2' function in D has some advantages (and it's still strongly pure). The D 'f2' function returns a ulong because when n grows large, the output grows a lot.

- - - - - - - - - - - - - - - - - - -


This is similar to perms5.d, but uses the more efficient 'deckShuffle' function from perms4.d.
The run-time of perms6.d is only 0.20 seconds (this includes the computations for n in [1, 1_000[), so it's more than six times faster than perms5.d.

- - - - - - - - - - - - - - - - - - -


This is a lower-level version (that is easy to port to C language), it's similar to perms6.d, but here in 'f2' instead of the associative array 'graph' I have re-used the array D2, filling it with 'emptyItem' when I remove a value. 'f2' is now also @nogc because it accepts fixed-size arrays as arguments and then slices them. So the program allocates the arrays only inside the 'main' (and here are stack-allocated, so even the main is @nogc):
ulong f2(size_t m)(const uint n, ref Card[m] D1, ref Card[m] D2)
pure nothrow @safe @nogc {
    for (uint i = 0; i < n; i++)
        D1[i] = cast(Card)i;
    deck_shuffle(D1, D2, n);

    typeof(return) result = 1;
    uint graph_len = n;
    enum emptyItem = uint.max;

    while (graph_len) {
        uint x = uint.max;
        for (uint i = 0; i < n; i++)
            if (D2[i] != emptyItem)
                x = i;

        uint cycle_len = 0;
        while (D2[x] != emptyItem) {
            const uint y = D2[x];
            D2[x] = emptyItem;
            x = y;
        result = lcm(result, cycle_len);

    return result;

void main() nothrow @nogc {
    import core.stdc.stdio: printf;

    const uint nMax = 1_000;
    Card[nMax] d1 = void, d2 = void;

    for (uint n = 1; n <= nMax; n++)
        printf("%llu\n", f2(n, d1, d2));
perms7.d runs in about 0.05 seconds. So it's about 4 times faster than perms6.d.

To avoid overflow bugs I have created the perms7b.d and perms7c.d versions, that use core.checkedint (despite core.checkedint is not meant for user code). For both perms7b.d and perms7c.d the run-time is similar to the unchecked versions, and perms7b.d spots the overflow bugs:
1389: 4247163320400
1390: 465585120
1391: 1391
1392: 6570
1393: 4294945235192621194 (overflow)
1394: 3655769040
1395: 1194
1396: 930
1397: 1748154975840
1398: 1398
- - - - - - - - - - - - - - - - - - -


This version is very similar to perms7.d, but I've used BigInt instead of ulong to compute all correct results (if there are no other bugs or design mistakes).

perms8.d is still quite fast, it runs in about 0.08 seconds (this includes the computations for n in [1, 1_000[). If I increase nMax to 10_000 it runs in about 6.6 seconds. D BigInts are not very fast, but I think this is not a significant problem for this program.

- - - - - - - - - - - - - - - - - - -


This is C code derived from perms7.d.

perms9.c runs in about 0.03 seconds or less. perms9.c is a little faster than perms7.d, but the speed difference is really visible when nMax is much larger (but then the program gives wrong outputs becuse of overflows). The difference between 0.03 and 0.05 seconds is mostly caused by the time taken to initialize the D-runtime.

- - - - - - - - - - - - - - - - - - -
comments: Leave a comment Previous Entry Share

leonardo - Permutations exercise in D (and C)
View:Recent Entries.
View:Website (My Website).