Crystal: Discussion: Hashing namespace with Hasher classes

Created on 16 Jun 2017 · 18Comments · Source: crystal-lang/crystal

Currently Crystal ignores existence of HashFlood: hash function for base types is trivial "identity", and for strings it is very simple and easy to forge, and no any seeding performed.

Hash functions should be at least not-trivially forged when random seed used.

Design with separated hasher allows switch between different hash algorithms: default "safer" for public facing applications, and "faster" for non-tainted data.

As proposed by @RX14

class Object
  def hash
    Hashing::StdHasher.build do |hasher|
      hash(hasher)
    end
  end

  def hash(hasher)
    hasher << object_id
  end
end

class MyClass
  property foo : Foo
  property bar : Bar

  def hash(hasher)
    hasher << @foo << @bar
  end
end

Extracted from #4557 discussion because it may be discussed and implemented before.

/cc @RX14, @funny-falcon

Source

akzhan

Most helpful comment

Thanks for taking the time to explain. I agree hash codes as currently implemented are simple but basic and problematic (leading to attack vectors).

allows switch between different hash algorithms, one safer, one faster.

No. There should be a single hash solution, good for all use cases (safety first, then the fastest), implemented just like today: a generic Object#hash implementation, which can be overriden to better fits each struct/class data.

ysbaddaden on 19 Jun 2017

👍2

All 18 comments

Looks like we need

module Hashing
  alias Type = Int32

  module Build
    def build : Type
      builder = new
      yield builder
      builder.calculate
    end

    def seed
      555
    end
  end

  class StdHasher
    extend Build

    property val

    def initialize
      @val = 0
    end

    def <<(value) : self
      self.val += value
      self
    end

    def calculate : Type
      val + self.class.seed
    end
  end

end

hash = Hashing::StdHasher.build do |hasher|
  hasher << 12 << 24
end

p hash

Any suggestions?

akzhan on 16 Jun 2017

Looks a lot like I thought about.

There is also need for random seed. I understood how to generate it with SecureRandom, but I'm in doubdt how to share it between StdHasher and Int32Hasher (special case for Hash(Int32, V)). Can you give me an advice?

funny-falcon on 16 Jun 2017

I suppose that You may implement seed in another module like a Build. Extend classes with it and do self.class.seed.

By the way - hashers are reference classes in example above now.

akzhan on 16 Jun 2017

Here is draft of Hashing::StdHasher https://gist.github.com/funny-falcon/52eb622d689a2eae3e2d7b4e4a77d5ac

I strongly miss #1517 for convenience. Had to do wrapper struct holding pointer to implementation to achieve stack allocation + pass-by-pointer + convenient method calling. Compiler is not smart enough to allocate class-object on a stack.

funny-falcon on 17 Jun 2017

👍1

why not

class String
  def hash(hasher)
    hasher << to_slice
  end
end

also looks like Object.hash() with build already does *.hash with build, so need no duplicated code.
```

akzhan on 17 Jun 2017

why not

Ah, you are right.

also looks like Object.hash() with build already does *.hash with build, so need no duplicated code.

Sorry, I doesn't understand what you mean :-( This were example that compiles with unmodified master.
Of course I will modify original Object#hash() in pull request, if you mean that.
But still Hash may initiate builder by itself (for example, if per-hash-table seed will be accepted).

funny-falcon on 17 Jun 2017

Yes, I mean that.

Please feel free to clone crystal repo, make crystal and use bin/crystal.

It allows to do anything.

btw, I can do a lot with standard Crystal: https://play.crystal-lang.org/#/r/27bo

akzhan on 17 Jun 2017

Yes, I use bin/crystal.

funny-falcon on 17 Jun 2017

Opening issues with code and no further explanations is only asking volonteers to search and seek for what the hell you are trying to solve. Please at least explain the issue, or nobody from the core team will care to take a look.

ysbaddaden on 18 Jun 2017

Explanation:
Currently Crystal ignores existence of HashFlood: hash function for base types is trivial "identity", and for strings it is very simple and easy to forge, and no any seeding performed.

Hash functions should be at least not-trivially forged when random seed used.

Design with def hash(hasher); hasher << @a << @b; end allows switch between different hash algorithms: default "safer" for public facing applications, and "faster" for non-tainted data.

I'm working on an issue, and will do pull request next week, if no one complains against.

funny-falcon on 19 Jun 2017

❤1

Thanks for taking the time to explain. I agree hash codes as currently implemented are simple but basic and problematic (leading to attack vectors).

allows switch between different hash algorithms, one safer, one faster.

ysbaddaden on 19 Jun 2017

👍2

@ysbaddaden, in this case overriding hash would be difficult, as user will have to think about safety and seeding (or just use boilerplate h = Hasher.new; h<<field1...; h.finalize). Design with hash(hasher) looks more convenient even if there will be only one way to hash (and overriding hash directly will still be possible).

konovod on 19 Jun 2017

After all, hash(hasher) is very reusable : one may use even SHA3 without changing a line, just supplying other hasher.

Though, there is a one pity point: Hash#hash currently key/insertion/iteration order agnostic. It will be quite difficult to preserve this property with hash(hasher). Either one have to sort by keys before hashing, or declare that hashes with different iteration order are different.

funny-falcon on 19 Jun 2017

#hash is something nobody should have to care about, even less deal with: the stdlib must do all the job, and do it so well that nobody else than stdlib maintainers have to worry about it. Overiding is for stdlib structs like Float64 or Time may provide specific solutions, if need be.

ysbaddaden on 19 Jun 2017

@funny-falcon while that might be useful, I think it's a bad idea to overly abstract this and implement some kind of class hierarchy so that you can use multiple hashing algorithms. You should just implement a single Hasher class which implements one good generic hash. If people want to use multiple hashing algorithms they can do so through duck typing.

RX14 on 19 Jun 2017

👍1

@ysbaddaden everyone who defines custom equality also has to define a custom hash, and I think that's pretty common. We even have a macro def_equals_and_hash to make it simple for people to override the equality.

RX14 on 19 Jun 2017

@RX14,

I think it's a bad idea to overly abstract this and implement some kind of class hierarchy so that you can use multiple hashing algorithms.

No one talks about class hierarchy.

If people want to use multiple hashing algorithms they can do so through duck typing.

With def hash(h); h << @a << @b ; end it is quite easy to pass non-default hasher with duck-typing.
Non-default hasher just have to implement following contract:

class NonDefaultHasher
   def <<(v : Nil)
   end
   def <<(v : Int::Primitive)
   end
   def <<(v : Bytes)
   end
   def <<(v)
     v.hash(self)
   end
   # initialize and calculate are omitted, cause they are not part of contract.
end

Without such contract use of duck-typing it is quite hard.

funny-falcon on 19 Jun 2017

@funny-falcon yes you're exactly correct. I thought someone had proposed something complex but I was wrong. My bad.

RX14 on 19 Jun 2017

Was this page helpful?

0 / 5 - 0 ratings