| leonardo ( @ 2008-08-07 19:42:00 |
Goto
Python 'for' and 'while' statements allow an 'else' clause, such part is executed if the cycle isn't exited by a break. So this code prints 2) and 4):
While reading an old article by Knuth I think I may have found one of the things that may have suggested Guido Rossum to put such syntax into Python. I think such 'else' removes one (common?) use case for the 'goto' statement, that's missing in Python (http://entrian.com/goto/index.html ).
This is the article: "Structured Programming With Go To Statements", by Knuth, Computing Surveys, Vol 6, No 4, December 1974:
http://pplab.snu.ac.kr/courses/adv_ pl05/papers/p261-knuth.pdf
At page 6 of the PDF the author presents a little problem. We want to search into an array A of distinct values, to find the index of a X item inside it. If not present, we want to add it to the sequence (at the end, you can do it increasing a length value if there's enough space). Plus, we want to keep a second array B, where the B[i] contains the number of times we have looked for A[i].
This means we are trying to create an associative array of the frequencies, so in Python you can do that with a default dict of the frequencies, test is some testing data of many items X:
In C a way to implement that code efficiently is the following one, that uses a goto (note that there's no need to clear a and b, they become overwritten), C code:
If there's a way to allocate one item longer a,b arrays then you can use a sentinel, this trick produces my fastest solution (you don't need to remove the sentinel at the end because the 'last' tells you how many good items are present), C code:
gcc -Wall -fomit-frame-pointer -fprofile-generate -O3 -s tempc.c -o tempc
followed by a run, followed by:
gcc -Wall -fomit-frame-pointer -fprofile-use -O3 -s tempc.c -o tempc
counter2 becomes:
Python 'for' and 'while' statements allow an 'else' clause, such part is executed if the cycle isn't exited by a break. So this code prints 2) and 4):
for x in range(10):
break
else:
print "1)"
for x in range(10):
pass
else:
print "2)"
while True:
break
else:
print "3)"
while False:
break
else:
print "4)"
I don't know other languages with such clause. I don't use it often, I find it not much intuitive, but once in a while I have found it useful. While reading an old article by Knuth I think I may have found one of the things that may have suggested Guido Rossum to put such syntax into Python. I think such 'else' removes one (common?) use case for the 'goto' statement, that's missing in Python (http://entrian.com/goto/index.html
This is the article: "Structured Programming With Go To Statements", by Knuth, Computing Surveys, Vol 6, No 4, December 1974:
http://pplab.snu.ac.kr/courses/adv_
At page 6 of the PDF the author presents a little problem. We want to search into an array A of distinct values, to find the index of a X item inside it. If not present, we want to add it to the sequence (at the end, you can do it increasing a length value if there's enough space). Plus, we want to keep a second array B, where the B[i] contains the number of times we have looked for A[i].
This means we are trying to create an associative array of the frequencies, so in Python you can do that with a default dict of the frequencies, test is some testing data of many items X:
test = [1, 4, 3, 2, 5, 1, 2, 3, 4, 6, 1]
from collections import defaultdict
d = defaultdict(int)
for x in test:
d[x] += 1
print d
If we want to use the array pair as in the original description of the problem, we can do like this:a = []
b = []
for x in test:
for i, el in enumerate(a):
if el == x:
b[i] += 1
break
else:
a.append(x)
b.append(1)
print a, b
In Python dicts are very fast, and the computational complexity of that code is linear. But if you work in a lower level language like D, and you have just few elements (like in that 'test' input array), and you already have two A, B (so you don't need to create them, for example because you're in a loop, etc), then that code with two arrays can be faster (because in such low level languages a scan in a short array can be faster than an hash lookup).In C a way to implement that code efficiently is the following one, that uses a goto (note that there's no need to clear a and b, they become overwritten), C code:
#define TESTLEN 11
void counter1(int* test) {
int a[11], b[11];
int last = 0, i = 0, j;
while (i < TESTLEN) {
int x = test[i];
for (j = 0; j < last; j++)
if (a[j] == x) {
b[j]++;
goto FOUND;
}
a[last] = x;
b[last] = 1;
last++;
FOUND:
i++;
}
// here use a[0..last-1] and b[0..last-1]
}
In D you can solve the same problem in many ways, you can use code much similar to that C one, or you can use an extra boolean variable, that slows down the code a little, D code:int[11] a, b;
int last;
for (int i; i < test.length; i++) {
auto x = test[i];
bool found = false;
for (int j; j < last; j++)
if (a[j] == x) {
b[j]++;
found = true;
break;
}
if (!found) {
a[last] = x;
b[last] = 1;
last++;
}
}
But in the past every CPU cycle was precious and CPUs weren't good at doing many things done by modern CPUs (branch prediction, ahead of time execution, etc), so the usage of that 'found' plus the 'if' was a waste. In D you can also solve the same problem without gotos, shortening the code, using a named 'continue' statement (but the code with goto is faster), D code:OUTER:
for (int i; i < test.length; i++) {
auto x = test[i];
for (int j; j < last; j++)
if (x == set[j]) {
freqs[j]++;
continue OUTER;
}
set[last] = x;
freqs[last++] = 1;
}
In most practical situations of this problem, in D as in Python/Psyco, I use an associative array/dict.If there's a way to allocate one item longer a,b arrays then you can use a sentinel, this trick produces my fastest solution (you don't need to remove the sentinel at the end because the 'last' tells you how many good items are present), C code:
void counter2(int* test) {
#define LENARR 12
int a[LENARR], b[LENARR];
int last = 0, i, j;
for (i = 0; i < TESTLEN; i++) {
int x = test[i];
a[last] = x;
j = 0;
while (a[j] != x)
j++;
if (j < last)
b[j]++;
else {
a[last] = x;
b[last] = 1;
last++;
}
}
#undef LENARR
}
Compiling it with MinGW 4.2.1 with:gcc -Wall -fomit-frame-pointer -fprofile-generate -O3 -s tempc.c -o tempc
followed by a run, followed by:
gcc -Wall -fomit-frame-pointer -fprofile-use -O3 -s tempc.c -o tempc
counter2 becomes:
_counter2:
pushl %edi
xorl %ecx, %ecx
pushl %esi
pushl %ebx
xorl %ebx, %ebx
subl $48, %esp
movl 64(%esp), %edi
movl %esp, %esi
.p2align 4,,7
L37:
movl (%edi,%ecx,4), %eax
xorl %edx, %edx
movl %eax, (%esp,%ebx,4)
cmpl %eax, (%esp)
je L40
cmpl %eax, 4(%esi)
movl $1, %edx
je L40
cmpl %eax, 8(%esi)
movb $2, %dl
je L40
cmpl %eax, 12(%esi)
movb $3, %dl
je L40
L41:
addl $1, %edx
cmpl %eax, (%esi,%edx,4)
jne L41
.p2align 4,,7
L40:
cmpl %ebx, %edx
jl L42
movl %eax, (%esp,%ebx,4)
addl $1, %ebx
L42:
addl $1, %ecx
cmpl $11, %ecx
jne L37
addl $48, %esp
popl %ebx
popl %esi
popl %edi
ret
To scan the array of 11 items it needs about 76 clock ticks of my CPU, that's just a bit more than the minimum. If you write the code in assembly I think it's not easy to go below 50-60 clock ticks (and I don't know if SSSE3 instructions may help here).