I think trying to come up with some generally useful hash function that will generate unique hash values for collections is an exercise in futility. U62’s suggestion of combining the hashes of all the contents will not scale well, as it makes the hash function O(n). Hash functions should really be O(1) to ensure good performance, otherwise the purpose of the hash is defeated. (Consider the common Cocoa construct of plists, which are dictionaries containing arrays and other dictionaries, potentially ad nauseum. Attempting to take the hash of the top-level dictionary of a large plist would be excruciatingly slow if the collections’ hash functions were O(n).)
My suggestion would be not to worry a great deal about a collection’s hash. As you stated,
-isEqual: implies equal
-hash values. On the other hand, equal
-hash values do not imply
-isEqual:. That fact gives you a lot of leeway to create a simple hash.
If you’re really worried about collisions though (and you have proof in concrete measurements of real-world situations that confirm it is something to be worried about), you could still follow U62’s advice to some degree. For example, you could take the hash of, say, the first and/or last element in the collection, and combine that with, say, the
-count of the collection. That be enough to provide a decent hash.
I hope that answers at least one of your questions.
As for No. 1: Implementing
-isEqual: is pretty cut and dry. You enumerate the contents, and check isEqual: on each of the elements.
There is one thing to be careful of that may affect what you decide to do for your collections’
-hash functions. Clients of your collections must also understand the rules governing
-hash. If you use the contents’
-hash in your collection’s
-hash, your collection will break if the contents’
-hash don’t agree. It’s the client’s fault, of course, but that’s another argument against basing your
-hash off of the collection’s contents.
No. 2 is kind of vague. Not sure what you have in mind there.