In this code-challenge you will write a hash function in 140 bytes1 or less of source code. The hash function must take an ASCII string as input, and return a 24-bit unsigned integer ([0, 224-1]) as output.
Your hash function will be evaluated for every word in this large British English dictionary2. Your score is the amount of words that share a hash value with another word (a collision).
The lowest score wins, ties broken by first poster.
Test case
Before submitting, please test your scoring script on the following input:
duplicate
duplicate
duplicate
duplicate
If it gives any score other than 4, it is buggy.
Clarifying rules:
- Your hash function must run on a single string, not a whole array. Also, your hash function may not do any other I/O than the input string and output integer.
- Built-in hash functions or similar functionality (e.g. encryption to scramble bytes) is disallowed.
- Your hash function must be deterministic.
- Contrary to most other contests optimizing specifically for the scoring input is allowed.
1 I am aware Twitter limits characters instead of bytes, but for simplicity we will use bytes as a limit for this challenge.
2 Modified from Debian's wbritish-huge, removing any non-ASCII words.