Quantcast
Viewing all articles
Browse latest Browse all 25

Answer by Paul Chernoch for Tweetable hash function challenge

Ruby, 6473 collisions, 129 bytes

h=->(w){@p=@p||(2..999).select{|i|(2..i**0.5).select{|j|i%j==0}==[]};c=w.chars.reduce(1){|a,s|(a*@p[s.ord%92]+179)%((1<<24)-3)}}

The @p variable is filled with all the primes below 999.

This converts ascii values into primes numbers and takes their product modulo a large prime. The fudge factor of 179 deals with the fact that the original algorithm was for use in finding anagrams, where all words that are rearrangements of the same letters get the same hash. By adding the factor in the loop, it makes anagrams have distinct codes.

I could remove the **0.5 (sqrt test for prime) at the expense of poorer performance to shorten the code. I could even make the prime number finder execute in the loop to remove nine more characters, leaving 115 bytes.

To test, the following tries to find the best value for the fudge factor in the range 1 to 300. It assume that the word file in in the /tmp directory:

h=->(w,y){
  @p=@p||(2..999).
    select{|i|(2..i**0.5). 
    select{|j|i%j==0}==[]};
  c=w.chars.reduce(1){|a,s|(a*@p[s.ord%92]+y)%((1<<24)-3)}
}

american_dictionary = "/usr/share/dict/words"
british_dictionary = "/tmp/british-english-huge.txt"
words = (IO.readlines british_dictionary).map{|word| word.chomp}.uniq
wordcount = words.size

fewest_collisions = 9999
(1..300).each do |y|
  whash = Hash.new(0)
  words.each do |w|
    code=h.call(w,y)
    whash[code] += 1
  end
  hashcount = whash.size
  collisions = whash.values.select{|count| count > 1}.inject(:+)
  if (collisions < fewest_collisions)
    puts "y = #{y}. #{collisions} Collisions. #{wordcount} Unique words. #{hashcount} Unique hash values"
    fewest_collisions = collisions
  end
end

Viewing all articles
Browse latest Browse all 25

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>