| ||||||||
| I have found this interesting suite of benchmarks: http://www.bioinformatics.org/bench Related to this "A comparison of common programming languages used in bioinformatics" article: http://www.biomedcentral.com/1471-2 See this too: http://hackmap.blogspot.com/2008/02/fas I have tried to repeat some of your benchmarks, but I can't repeat the 9+ GB file used for one of them. So far I have tested only the "alignment.py" program, because it's the only with data available. Few comments on that work: 1) For Python, for such 3 benchmarks the Psyco JustInTime compiler helps a lot. Just install it: http://psyco.sourceforge.net/ and add the following line to the code: import psyco; psyco.full() 2) I haven't run the program "parse.py", but it shows various inefficiencies: - No Psyco - heavy-loop processing code outside funtions. Just put that big while 1: inside a main() function will help. - Binary file loading is faster than normal one, just read the file with "rb". - Python files are iterables, so to scan a file you just need: for line in file("somefile", "rb"): print line - To take the next line you just need: f = file("somefile", "rb"): print f.next() - You don't need to use string functions, Python strings have methods, so you can do "something".replace("some", "any") - Sometimes in Python you can replace REs with string methods. - This is a bad way to strip a line: line = string.replace(line,'\n','') This is better and probably faster: line = line.rstrip() 3) I like the D language, so I have tried it: http://www.digitalmars.com/d/1.0/in My timings of the "alignment" program: D3: 0.96 s 41 MB 66 lines C: 1.05 s 41 MB 146 lines C++: 1.22 s 41 MB 87 lines D2: 1.95 s 54 MB 60 lines D1: 2.73 s 57 MB 58 lines Java: 3.72 s 53 MB 79 lines Psyco: 11.22 s 168 MB 64 lines Python2: 115.3 s ~160 MB 63 linesNotes: 3a) C is the C code compiled with: -pipe -O3 -s -ffast-math -fomit-frame-pointer -funroll-loops -march=pentiumpro -fprofile-generate -pipe -O3 -s -ffast-math -fomit-frame-pointer -funroll-loops -march=pentiumpro -fprofile-use 3b) D code (compiler V. 1.026) compiled with: -O -release -inline D1 is a direct translation of the Java version D2: inverts the building of the strings, and reverses them at the end (appending at the start of an array is slow thing in any language) D3: is like D4 but it doesn't pre-clear the allocated memory. Finally, my D code can be found here: http://www.fantascienza.net/leonard Update Mar 7 2008: later I have found one version of the L Gene of the Hantaan virus: http://beta.uniprot.org/uniprot/P23 | ||||||||
| comments: Leave a comment |
