leonardo - Not dependent types in D
View:Recent Entries.
View:Archive.
View:Friends.
View:Profile.
View:Website (My Website).

Tags:, , ,
Security:
Subject:Not dependent types in D
Time:09:20 am
I've found an interesting post about Dependent Types in ATS language:
http://www.bluishcoder.co.nz/2010/09/01/dependent-types-in-ats.html

More on ATS language, created by the good Hongwei Xi:
http://en.wikipedia.org/wiki/ATS_%28programming_language%29
http://www.ats-lang.org/

So I've tried to do something vaguely similar in D V2 language. Of course both me and D are not up to this task, so the result is a program that shows not dependent types in D. I do not know much about ATS language and Dependent Types, so this post may contain errors and naive statements.

This is probably basic stuff for a functional programmer, but for me it's not obvious code. This code is mostly functional in style and it implements a statically length-typed single-linked list. Its nodes contain a polymorphic cargo and a pointer to the next node, and each node is also typed with its position in the list. At run-time the list length is not stored in a field, the list length is encoded in the type of its first node.

See the full code:
http://www.fantascienza.net/leonardo/js/len_typed_list.zip

The code contains some workarounds for old and new bugs of the D2 compiler.

D2 language has removed the ability to embed D code in HTML, so in this text I just show and explain few snippets of the whole code you may find in the zip:
http://www.digitalmars.com/d/1.0/html.html

This is the list node. LEN is a compile-time constant that keeps N (there is a way to perform static introspection and find the argument values of a template, but this solution is simpler), that's the length of the list. The node is named "Cons" according to its usage in Lisp-like and ML-like languages. The static if(N > 0) is present to avoid infinite recursion in the alias TN:
struct Cons(T, int N) if (N >= 0) {
    enum int LEN = N;

    static if (N > 0) {
        T data;
        alias Cons!(T, N-1) TN;
        TN* next;

        this(T data_, TN* next_) {
            data = data_;
            next = next_;
        }
}

A handy way to define the list terminator (the list terminator is a struct with no data and next fields):
Cons!(T, 0)* nil(T)() pure nothrow {
    return Cons!(T, 0).NIL;
}

The functions like the list representation are recursive function templates because each node has different type:
string listRepr(T, int N)(Cons!(T, N)* ls) {
    static string listReprInner(T2, int N2)(Cons!(T2, N2)* ls) {
        string dataRep = to!string(*ls);
        static if (N2 > 1)
            return dataRep ~ ", " ~ listReprInner(tail(ls));
        else
            return dataRep;
    }

    static if (N == 0)
        return "nil!" ~ T.stringof ~ "()";
    else
        return "consList(" ~ listReprInner(ls) ~ ")";
}

Now you may define a list of integers like this:
cons(1, cons(2, cons(3, nil!int())));

But that's not handy, so I define a helper function template, that allows to define the same list like this:
consList(1, 2, 3)

As usual the function is short and recursive:
Cons!(Ts[0], cast(int)Ts.length)* consList(Ts...)(Ts items) if (Ts.length > 0) {
    static if (Ts.length == 1)
        return cons(items[0], nil!(Ts[0])());
    else
        return cons(items[0], consList(items[1 .. $]));
}

The cast is useful because in D lengths and indexes are unsigned words (size_t) instead of integers. This function is not defined for empty input (Ts.length==0) because in this case it can't infer the type of the resulting list (Ts[0]).

The full version inside the zip also uses AllSameType to be sure all given arguments are of the same type.

One important thing to keep in mind in this code is that while the length of those lists is a compile-time constant (and must be known at compile time) the contents of the list data are run-time variables that are not known at compile-time.

The Lisp-like functions to take the car and cdr of a list are easy:
pure nothrow T head(T, int N)(Cons!(T, N)* ls) if (N > 0) {
    return ls.data;
}

pure nothrow Cons!(T, N-1)* tail(T, int N)(Cons!(T, N)* ls) if (N > 0) {
    return ls.next;
}

The nice thing of both functions (named "head" and "tail" instead of "car" and "cdr" to make the code more readable) is that they have a template constraint that allows them to compile only when the given lists aren't empty. So (unless the pointers are null) there's no way to take the cdr of an empty list.

As you see all functions take a pointer to a Cons list. I have tried to write this program passing the list heads always by ref (also to avoid possible null pointer related bugs), but the resulting code was too much hard to debug for me (later I may add contracts to assert that the input lists are not null pointers).

Where possible all functions are pure and nothrow (the inner functions need the annotations like "pure" on the right of the signature because of a bug of DMD 2.050 that will be fixed soon).

I have also tried to use const/immutable often, but I have not found a simple way to make the resulting code compile, so despite being mostly functional, this code doesn't use const/immutable.

The implementation of opEquals too has given me troubles, but I have created a equals() free function that returns true if two lists have the same run-time values.

As in the page from the bluishcoder.co.nz blog post, implementing the zip is not hard. It returns a list of pairs like (tuple(a1,b1), tuple(a2,b2)...) implemented using Tuple. This function template needs to take in account only two cases, because the two list must be known at compile-time to have the same length:
Cons!(Tuple!(T1, T2), N)* zip(T1, T2, int N)(Cons!(T1, N)* ls1, Cons!(T2, N)* ls2) if (N >= 0) {
    static if (N == 0)
        return nil!(Tuple!(T1, T2))();
    else
        return cons(tuple(head(ls1), head(ls2)), zip(tail(ls1), tail(ls2)));
}

This is the original implementation in ATS:
</pre>fun{a,b:t@ype} list_zip {n:nat} (xs: list (a, n), ys: list (b, n)): list ('(a,b), n) =
case+ (xs, ys) of
| (nil (), nil ()) => nil ()
| (cons (x, xs), cons (y, ys)) => cons('(x,y), list_zip(xs, ys))</pre>
A dropk() function is equally easy to implement in D, see the code in the zip file.

In ATS Hongwei Xi has created a wondrous implementation of (not efficient) QuickSort for lists that is guarantee by the type system to return a list that's always a sorted permutation of the input list:
http://www.ats-lang.org/EXAMPLE/MISC/listquicksort_dats.html

Probably I can't do that in D, but I have done something much simpler, an implementation of a functional style Merge Sort that guarantees (unless the input pointer is null) that the return length is the same as the input one.

pure Cons!(T, N1+N2)* append(T, int N1, int N2)(Cons!(T, N1)* ls1, Cons!(T, N2)* ls2) {
    static if (N1 == 0)
        return ls2;
    else
        return cons(head(ls1), append(tail(ls1), ls2));
}

pure Cons!(T, N)* reverse(T, int N)(Cons!(T, N)* ls) {
    static if (N == 0)
        return ls;
    else
        return append(reverse(tail(ls)), cons(head(ls), nil!T()));
}

Cons!(T, N)* mergeSort(T, int N)(Cons!(T, N)* ls) {
    static Tuple!(Cons!(T, N/2)*, Cons!(T, N-N/2)*) split2(T, int N)(Cons!(T, N)* ls) {
        static splitInner(int k, int N1, int N2)(Cons!(T, N1)* ls1, Cons!(T, N2)* ls2)
          if (k >= 0 && k <= N2) {
            static if (k == 0)
                return tuple(reverse(ls1), ls2);
            else
                return splitInner!(k-1)(cons(head(ls2), ls1), tail(ls2));
        }

        return splitInner!(N/2)(nil!T(), ls);
    }


    static Cons!(T, N1+N2)* merge(T, int N1, int N2)(Cons!(T, N1)* ls1, Cons!(T, N2)* ls2) pure {
        static if (N1 == 0) {
            return ls2;
        } else static if (N2 == 0) {
            return ls1;
        } else if (head(ls1) < head(ls2)) {
            return cons(head(ls1), merge(tail(ls1), ls2));
        } else {
            return cons(head(ls2), merge(ls1, tail(ls2)));
        }
    }

    static if (N <= 1) {
        return ls;
    } else {
        auto splits = split2(ls);
        return merge(mergeSort(splits[0]), mergeSort(splits[1]));
    }
}

Note that in merge() the (N2==0) test is static, but the (head(ls1) < head(ls2)) is done at run-time, because the list contents are not known at compile time.

A Scheme and Python implementation may be useful to understand the D code better, that's just a functional merge sort (original Scheme code: http://en.literateprograms.org/Merge_sort_%28Scheme%29 ):
(define (split ls)
  (letrec ([splitH (lambda (ls ls1 ls2)
                      (cond
                        [(|| (null? ls) (null? (cdr ls)))
                         (cons (reverse ls2) ls1)]
                        [else (splitH (cddr ls)
                                (cdr ls1) (cons (car ls1) ls2))]))])
    (splitH ls ls '())))

(define (merge pred ls1 ls2)
  (cond
    [(null? ls1) ls2]
    [(null? ls2) ls1]
    [(pred (car ls1) (car ls2))
     (cons (car ls1) (merge pred (cdr ls1) ls2))]
    [else (cons (car ls2) (merge pred ls1 (cdr ls2)))]))

(define (mergeSort pred ls)
  (cond
    [(null? ls) ls]
    [(null? (cdr ls)) ls]
    [else (let ([splits (split ls)])
            (merge pred
              (mergeSort pred (car splits))
              (mergeSort pred (cdr splits))))]))

A readable Python version:
def cons(a, b): return [a, b]
def head(ls): return ls[0]
def tail(ls): return ls[1]
def tailTail(ls): return tail(tail(ls))

def append(ls1, ls2):
    if ls1 == []:
        return ls2
    else:
        return cons(head(ls1), append(tail(ls1), ls2))

def reverse(ls):
    if ls == []:
        return ls
    else:
        return append(reverse(tail(ls)), cons(head(ls), []))

def split(ls):
    def splitInner(ls, ls1, ls2):
        if ls == [] or tail(ls) == []:
            return cons(reverse(ls2), ls1)
        else:
            return splitInner(tailTail(ls), tail(ls1), cons(head(ls1), ls2))

    return splitInner(ls, ls, [])

def mergeSort(ls):
    def merge(ls1, ls2):
        if ls1 == []:
            return ls2
        elif ls2 == []:
            return ls1
        elif head(ls1) < head(ls2):
            return cons(head(ls1), merge(tail(ls1), ls2))
        else:
            return cons(head(ls2), merge(ls1, tail(ls2)))

    if ls == [] or tail(ls) == []:
        return ls
    else:
        splits = split(ls)
        return merge(mergeSort(head(splits)), mergeSort(tail(splits)))

ls = [3,[4,[8,[0,[6,[7,[4,[2,[1,[9,[4,[5,[]]]]]]]]]]]]]
result = mergeSort(ls)
assertresult == [0,[1,[2,[3,[4,[4,[4,[5,[6,[7,[8,[9,[]]]]]]]]]]]]]

One difference is that in D the list length is known, so the split2() function may be simplified, calling splitInner with N/2.

In ATS you may implement a filter like this:
fun{a:t@ype} list_filter {m:nat} (
  xs: list (a, m),
  f: a -<> bool
  ): [n:nat | n <= m] list (a, n) =
  case+ xs of
  | nil () => nil ()
  | cons (x, xs) => if f(x) then cons(x, list_filter(xs, f)) else list_filter(xs, f)

But in D with the cons lists I have defined it's not possible to define a filter() because the length of the result is dependent on the run-time values inside the function and the run-time results of the filtering function.

In the D code the list lengths are known at compile-time, so all the type system has to do is to test that some numbers are equal at compile time, like in the signature of split2() or dropk().

In ATS the list_filter is implementable because it doesn't just assert the equality of some compile-time values. In ATS the programmer writes algebraic relations that the type system manages and verifies symbolically (using induction too, probably), so the length of the output list of list_filter is just known to be less or equal to the input length (with [n:nat | n<=m]). In general in the ATS program the type system doesn't know statically the length of the lists, it just makes sure the specified algebraic relations are coherent and correct.

The "Linear constraints" paragraph in the bluishcoder.co.nz blog post shows a limit of the inferencer in the ATS type system.

The ATS language is not a language for everyone to be used in all kinds of code. It allows much a more strict management of types and values, but this may require a large amount of extra work for the programmer. So ATS is for special situations. ATS is also a system language, so it may be used even to write Linux drivers.

This D code generates many templates, because each function is typed on the list length too. In theory a more flexible implementation of D templates may allow to specify that a template argument is known at compile-time but it's passed to the function as run-time value, avoiding the instantiation of many templates for each different value of that argument. A possible syntax idea:
pure Cons!(T, N)* reverse(T, @generic int N)(Cons!(T, N)* ls) {
    if (N == 0)
        return ls;
    else
        return append(reverse(tail(ls)), cons(head(ls), nil!T()));
}


To implement something closer to ATS dependent types in D, a friend has suggested me the idea of performing inferences over types at compile-time using Compile-Time Function Evaluation (CTFE). The constraints are represented with expression trees. This seems not useful in D because the expression trees are types still, so you need to perform casts everywhere, so I am not sure. Maybe there is a way to implement it in D. I think an optional structural typing (http://en.wikipedia.org/wiki/Structural_type_system ) is necessary in D to implement this idea well enough. Some optional structural typing may be later added to D2 or D3 to implement better tuples.

See the full code:
http://www.fantascienza.net/leonardo/js/len_typed_list.zip
comments: Leave a comment Previous Entry Share Next Entry

leonardo - Not dependent types in D
View:Recent Entries.
View:Archive.
View:Friends.
View:Profile.
View:Website (My Website).