leonardo
View:Recent Entries.
View:Archive.
View:Friends.
View:User Info.
View:Website (My Website).
You're looking at the latest 2 entries.

Tags:, ,
Subject:n-way templated Huffman
Time:02:36 pm
I have improved the nhuffman code:
- Now the function that generates the Huffman tree takes the Node class as parameter, so it can be used a different Node, this changes this code to "templated", because different ordering functions are possibile. (The node sorting is based on their frequency still, so it's not a fully templated Huffman yet, but it's not difficult to modify it).
- The function that generates the Huffman tree was rather slow if the input contain many different input symbols (like 8000 instead of the usual possibile 256 of an ASCII text), so I have improved its speed some, and then I have used a heap to take the tree roots with less frequency (instead of sorting all tree roots every time), so now that function is about 15-20 times faster. Using the right data structure is important in Python too.

http://www.fantascienza.net/leonardo/so/index.html#nhuffman
comments: Leave a comment Add to Memories Tell a Friend

Tags:, ,
Subject:n-way Huffman done in Python
Time:12:32 pm
To compress the Chakasa Tail Signed Language (and because it's an interesting algorithm) I've written a n-way Huffman encoder/decoder with Python V.2.5. It optionally uses Psyco too. The code is slow compared to a C Huffman implementation, but it's flexible enough, and it can be modified to manage different kinds of symbol ordering, etc (Templated Huffman).

http://www.fantascienza.net/leonardo/so/index.html#nhuffman

At the moment it accepts strings, unicode strings, and lists of hashable objects:

Input data: 'abbcccdddd'
freqs: {'a': 1, 'c': 3, 'b': 2, 'd': 4}
nways: 2
Huffman tree:
  'd'    0
  'c'    11
  'a'    100
  'b'    101
huff_dict: {'a': '100', 'c': '11', 'b': '101', 'd': '0'}
len(encoded) (Huffman tree not included): 19 bit(s) ~= 3 byte(s).
entropy(text) * len(text): 19 bits.
encoded: 1001011011111110000


Input data: 'abbcccdddd'
freqs: {'a': 1, 'c': 3, 'b': 2, 'd': 4}
nways: 3
Huffman tree:
  'd'    0
  'a'    10
  'b'    11
  'c'    12
huff_dict: {'a': '10', 'c': '12', 'b': '11', 'd': '0'}
len(encoded) (Huffman tree not included): 16 symbol(s) (base 3)
entropy(text) * len(text): 19 bits.
encoded: 1011111212120000


Input data: ['hello', 'how', 'are', 'how', 'you', 'hello', 'are', 'how', 'you', 'do', 'hello', 'how']
nways: 2
Huffman tree:
  'are'    00
  'hello'  10
  'how'    11
  'do'     010
  'you'    011
huff_dict: {'are': '00', 'how': '11', 'you': '011', 'hello': '10', 'do': '010'}
len(encoded) (Huffman tree not included): 27 bit(s) ~= 4 byte(s).
entropy(seq) * len(seq): 27 bits.
encoded: 101100110111000110110101011


The code optionally uses a tiny compiled module, my first Pyrex module named "mixed". It contains a charfreq() function that given a string, returns a dict of {char:char_freq}. On long strings it is many times faster than my faster Python version.
comments: Leave a comment Add to Memories Tell a Friend

leonardo
View:Recent Entries.
View:Archive.
View:Friends.
View:User Info.
View:Website (My Website).
You're looking at the latest 2 entries.