By bitbonk


2008-11-04 20:53:19 8 Comments

In the .NET System.Object.GetHashCode method is used in a lot of places, throughout the .NET base class libraries. Especially when finding items in a collection fast or to determine equality. Is there a standard algorithm/ best practice on how to implement the GetHashCode override for my custom classes so I don't degrade performance?

19 comments

@Muhammad Rehan Saeed 2019-06-11 08:34:08

.NET Core 2.1 And Above

If you are using .NET Core 2.1 or above, you can use the System.HashCode struct. There are two methods of using it:

HashCode.Combine

The Combine method can be used to create a hash code, given up to eight objects.

public override int GetHashCode() => HashCode.Combine(this.object1, this.object2);

HashCode.Add

The Add method helps you to deal with collections:

public override int GetHashCode()
{
    var hashCode = new HashCode();
    hashCode.Add(this.object1);
    foreach (var item in this.collection)
    {
        hashCode.Add(item);
    }
    return hashCode.ToHashCode();
}

GetHashCode Made Easy

You can read the full blog post 'GetHashCode Made Easy' for more details and comments.

Usage Example

public class SuperHero
{
    public int Age { get; set; }
    public string Name { get; set; }
    public List<string> Powers { get; set; }

    public override int GetHashCode() =>
        HashCode.Of(this.name).And(this.age).AndEach(this.powers);
}

Implementation

public struct HashCode : IEquatable<HashCode>
{
    private const int EmptyCollectionPrimeNumber = 19;
    private readonly int value;

    private HashCode(int value) => this.value = value;

    public static implicit operator int(HashCode hashCode) => hashCode.value;

    public static bool operator ==(HashCode left, HashCode right) => left.Equals(right);

    public static bool operator !=(HashCode left, HashCode right) => !(left == right);

    public static HashCode Of<T>(T item) => new HashCode(GetHashCode(item));

    public static HashCode OfEach<T>(IEnumerable<T> items) =>
        items == null ? new HashCode(0) : new HashCode(GetHashCode(items, 0));

    public HashCode And<T>(T item) => 
        new HashCode(CombineHashCodes(this.value, GetHashCode(item)));

    public HashCode AndEach<T>(IEnumerable<T> items)
    {
        if (items == null)
        {
            return new HashCode(this.value);
        }

        return new HashCode(GetHashCode(items, this.value));
    }

    public bool Equals(HashCode other) => this.value.Equals(other.value);

    public override bool Equals(object obj)
    {
        if (obj is HashCode)
        {
            return this.Equals((HashCode)obj);
        }

        return false;
    }

    public override int GetHashCode() => this.value.GetHashCode();

    private static int CombineHashCodes(int h1, int h2)
    {
        unchecked
        {
            // Code copied from System.Tuple a good way to combine hashes.
            return ((h1 << 5) + h1) ^ h2;
        }
    }

    private static int GetHashCode<T>(T item) => item?.GetHashCode() ?? 0;

    private static int GetHashCode<T>(IEnumerable<T> items, int startHashCode)
    {
        var temp = startHashCode;

        var enumerator = items.GetEnumerator();
        if (enumerator.MoveNext())
        {
            temp = CombineHashCodes(temp, GetHashCode(enumerator.Current));

            while (enumerator.MoveNext())
            {
                temp = CombineHashCodes(temp, GetHashCode(enumerator.Current));
            }
        }
        else
        {
            temp = CombineHashCodes(temp, EmptyCollectionPrimeNumber);
        }

        return temp;
    }
}

@Steven Coco 2019-04-28 05:10:01

This is a static helper class that implements Josh Bloch's implementation; and provides explicit overloads to "prevent" boxing, and also to implement the hash specifically for the long primitives.

You can pass a string comparison that matches your equals implementation.

Because the Hash output is always an int, you can just chain Hash calls.

using System;
using System.Collections;
using System.Collections.Generic;
using System.Reflection;
using System.Runtime.CompilerServices;


namespace Sc.Util.System
{
    /// <summary>
    /// Static methods that allow easy implementation of hashCode. Example usage:
    /// <code>
    /// public override int GetHashCode()
    ///     => HashCodeHelper.Seed
    ///         .Hash(primitiveField)
    ///         .Hsh(objectField)
    ///         .Hash(iEnumerableField);
    /// </code>
    /// </summary>
    public static class HashCodeHelper
    {
        /// <summary>
        /// An initial value for a hashCode, to which is added contributions from fields.
        /// Using a non-zero value decreases collisions of hashCode values.
        /// </summary>
        public const int Seed = 23;

        private const int oddPrimeNumber = 37;


        /// <summary>
        /// Rotates the seed against a prime number.
        /// </summary>
        /// <param name="aSeed">The hash's first term.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        private static int rotateFirstTerm(int aSeed)
        {
            unchecked {
                return HashCodeHelper.oddPrimeNumber * aSeed;
            }
        }


        /// <summary>
        /// Contributes a boolean to the developing HashCode seed.
        /// </summary>
        /// <param name="aSeed">The developing HashCode value or seed.</param>
        /// <param name="aBoolean">The value to contribute.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int Hash(this int aSeed, bool aBoolean)
        {
            unchecked {
                return HashCodeHelper.rotateFirstTerm(aSeed)
                        + (aBoolean
                                ? 1
                                : 0);
            }
        }

        /// <summary>
        /// Contributes a char to the developing HashCode seed.
        /// </summary>
        /// <param name="aSeed">The developing HashCode value or seed.</param>
        /// <param name="aChar">The value to contribute.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int Hash(this int aSeed, char aChar)
        {
            unchecked {
                return HashCodeHelper.rotateFirstTerm(aSeed)
                        + aChar;
            }
        }

        /// <summary>
        /// Contributes an int to the developing HashCode seed.
        /// Note that byte and short are handled by this method, through implicit conversion.
        /// </summary>
        /// <param name="aSeed">The developing HashCode value or seed.</param>
        /// <param name="aInt">The value to contribute.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int Hash(this int aSeed, int aInt)
        {
            unchecked {
                return HashCodeHelper.rotateFirstTerm(aSeed)
                        + aInt;
            }
        }

        /// <summary>
        /// Contributes a long to the developing HashCode seed.
        /// </summary>
        /// <param name="aSeed">The developing HashCode value or seed.</param>
        /// <param name="aLong">The value to contribute.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int Hash(this int aSeed, long aLong)
        {
            unchecked {
                return HashCodeHelper.rotateFirstTerm(aSeed)
                        + (int)(aLong ^ (aLong >> 32));
            }
        }

        /// <summary>
        /// Contributes a float to the developing HashCode seed.
        /// </summary>
        /// <param name="aSeed">The developing HashCode value or seed.</param>
        /// <param name="aFloat">The value to contribute.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int Hash(this int aSeed, float aFloat)
        {
            unchecked {
                return HashCodeHelper.rotateFirstTerm(aSeed)
                        + Convert.ToInt32(aFloat);
            }
        }

        /// <summary>
        /// Contributes a double to the developing HashCode seed.
        /// </summary>
        /// <param name="aSeed">The developing HashCode value or seed.</param>
        /// <param name="aDouble">The value to contribute.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int Hash(this int aSeed, double aDouble)
            => aSeed.Hash(Convert.ToInt64(aDouble));

        /// <summary>
        /// Contributes a string to the developing HashCode seed.
        /// </summary>
        /// <param name="aSeed">The developing HashCode value or seed.</param>
        /// <param name="aString">The value to contribute.</param>
        /// <param name="stringComparison">Optional comparison that creates the hash.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int Hash(
                this int aSeed,
                string aString,
                StringComparison stringComparison = StringComparison.Ordinal)
        {
            if (aString == null)
                return aSeed.Hash(0);
            switch (stringComparison) {
                case StringComparison.CurrentCulture :
                    return StringComparer.CurrentCulture.GetHashCode(aString);
                case StringComparison.CurrentCultureIgnoreCase :
                    return StringComparer.CurrentCultureIgnoreCase.GetHashCode(aString);
                case StringComparison.InvariantCulture :
                    return StringComparer.InvariantCulture.GetHashCode(aString);
                case StringComparison.InvariantCultureIgnoreCase :
                    return StringComparer.InvariantCultureIgnoreCase.GetHashCode(aString);
                case StringComparison.OrdinalIgnoreCase :
                    return StringComparer.OrdinalIgnoreCase.GetHashCode(aString);
                default :
                    return StringComparer.Ordinal.GetHashCode(aString);
            }
        }

        /// <summary>
        /// Contributes a possibly-null array to the developing HashCode seed.
        /// Each element may be a primitive, a reference, or a possibly-null array.
        /// </summary>
        /// <param name="aSeed">The developing HashCode value or seed.</param>
        /// <param name="aArray">CAN be null.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int Hash(this int aSeed, IEnumerable aArray)
        {
            if (aArray == null)
                return aSeed.Hash(0);
            int countPlusOne = 1; // So it differs from null
            foreach (object item in aArray) {
                ++countPlusOne;
                if (item is IEnumerable arrayItem) {
                    if (!object.ReferenceEquals(aArray, arrayItem))
                        aSeed = aSeed.Hash(arrayItem); // recursive call!
                } else
                    aSeed = aSeed.Hash(item);
            }
            return aSeed.Hash(countPlusOne);
        }

        /// <summary>
        /// Contributes a possibly-null array to the developing HashCode seed.
        /// You must provide the hash function for each element.
        /// </summary>
        /// <param name="aSeed">The developing HashCode value or seed.</param>
        /// <param name="aArray">CAN be null.</param>
        /// <param name="hashElement">Required: yields the hash for each element
        /// in <paramref name="aArray"/>.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int Hash<T>(this int aSeed, IEnumerable<T> aArray, Func<T, int> hashElement)
        {
            if (aArray == null)
                return aSeed.Hash(0);
            int countPlusOne = 1; // So it differs from null
            foreach (T item in aArray) {
                ++countPlusOne;
                aSeed = aSeed.Hash(hashElement(item));
            }
            return aSeed.Hash(countPlusOne);
        }

        /// <summary>
        /// Contributes a possibly-null object to the developing HashCode seed.
        /// </summary>
        /// <param name="aSeed">The developing HashCode value or seed.</param>
        /// <param name="aObject">CAN be null.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int Hash(this int aSeed, object aObject)
        {
            switch (aObject) {
                case null :
                    return aSeed.Hash(0);
                case bool b :
                    return aSeed.Hash(b);
                case char c :
                    return aSeed.Hash(c);
                case int i :
                    return aSeed.Hash(i);
                case long l :
                    return aSeed.Hash(l);
                case float f :
                    return aSeed.Hash(f);
                case double d :
                    return aSeed.Hash(d);
                case string s :
                    return aSeed.Hash(s);
                case IEnumerable iEnumerable :
                    return aSeed.Hash(iEnumerable);
            }
            return aSeed.Hash(aObject.GetHashCode());
        }


        /// <summary>
        /// This utility method uses reflection to iterate all specified properties that are readable
        /// on the given object, excluding any property names given in the params arguments, and
        /// generates a hashcode.
        /// </summary>
        /// <param name="aSeed">The developing hash code, or the seed: if you have no seed, use
        /// the <see cref="Seed"/>.</param>
        /// <param name="aObject">CAN be null.</param>
        /// <param name="propertySelector"><see cref="BindingFlags"/> to select the properties to hash.</param>
        /// <param name="ignorePropertyNames">Optional.</param>
        /// <returns>A hash from the properties contributed to <c>aSeed</c>.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int HashAllProperties(
                this int aSeed,
                object aObject,
                BindingFlags propertySelector
                        = BindingFlags.Instance
                        | BindingFlags.Public
                        | BindingFlags.GetProperty,
                params string[] ignorePropertyNames)
        {
            if (aObject == null)
                return aSeed.Hash(0);
            if ((ignorePropertyNames != null)
                    && (ignorePropertyNames.Length != 0)) {
                foreach (PropertyInfo propertyInfo in aObject.GetType()
                        .GetProperties(propertySelector)) {
                    if (!propertyInfo.CanRead
                            || (Array.IndexOf(ignorePropertyNames, propertyInfo.Name) >= 0))
                        continue;
                    aSeed = aSeed.Hash(propertyInfo.GetValue(aObject));
                }
            } else {
                foreach (PropertyInfo propertyInfo in aObject.GetType()
                        .GetProperties(propertySelector)) {
                    if (propertyInfo.CanRead)
                        aSeed = aSeed.Hash(propertyInfo.GetValue(aObject));
                }
            }
            return aSeed;
        }


        /// <summary>
        /// NOTICE: this method is provided to contribute a <see cref="KeyValuePair{TKey,TValue}"/> to
        /// the developing HashCode seed; by hashing the key and the value independently. HOWEVER,
        /// this method has a different name since it will not be automatically invoked by
        /// <see cref="Hash(int,object)"/>, <see cref="Hash(int,IEnumerable)"/>,
        /// or <see cref="HashAllProperties"/> --- you MUST NOT mix this method with those unless
        /// you are sure that no KeyValuePair instances will be passed to those methods; or otherwise
        /// the generated hash code will not be consistent. This method itself ALSO will not invoke
        /// this method on the Key or Value here if that itself is a KeyValuePair.
        /// </summary>
        /// <param name="aSeed">The developing HashCode value or seed.</param>
        /// <param name="keyValuePair">The value to contribute.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int HashKeyAndValue<TKey, TValue>(this int aSeed, KeyValuePair<TKey, TValue> keyValuePair)
            => aSeed.Hash(keyValuePair.Key)
                    .Hash(keyValuePair.Value);

        /// <summary>
        /// NOTICE: this method is provided to contribute a collection of <see cref="KeyValuePair{TKey,TValue}"/>
        /// to the developing HashCode seed; by hashing the key and the value independently. HOWEVER,
        /// this method has a different name since it will not be automatically invoked by
        /// <see cref="Hash(int,object)"/>, <see cref="Hash(int,IEnumerable)"/>,
        /// or <see cref="HashAllProperties"/> --- you MUST NOT mix this method with those unless
        /// you are sure that no KeyValuePair instances will be passed to those methods; or otherwise
        /// the generated hash code will not be consistent. This method itself ALSO will not invoke
        /// this method on a Key or Value here if that itself is a KeyValuePair or an Enumerable of
        /// KeyValuePair.
        /// </summary>
        /// <param name="aSeed">The developing HashCode value or seed.</param>
        /// <param name="keyValuePairs">The values to contribute.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int HashKeysAndValues<TKey, TValue>(
                this int aSeed,
                IEnumerable<KeyValuePair<TKey, TValue>> keyValuePairs)
        {
            if (keyValuePairs == null)
                return aSeed.Hash(null);
            foreach (KeyValuePair<TKey, TValue> keyValuePair in keyValuePairs) {
                aSeed = aSeed.HashKeyAndValue(keyValuePair);
            }
            return aSeed;
        }
    }
}

@Steven Coco 2019-05-09 00:14:50

Yipes: I found a bug! The HashKeysAndValues method has been fixed: it invokes HashKeyAndValue.

@Jon Skeet 2008-11-04 20:56:17

I usually go with something like the implementation given in Josh Bloch's fabulous Effective Java. It's fast and creates a pretty good hash which is unlikely to cause collisions. Pick two different prime numbers, e.g. 17 and 23, and do:

public override int GetHashCode()
{
    unchecked // Overflow is fine, just wrap
    {
        int hash = 17;
        // Suitable nullity checks etc, of course :)
        hash = hash * 23 + field1.GetHashCode();
        hash = hash * 23 + field2.GetHashCode();
        hash = hash * 23 + field3.GetHashCode();
        return hash;
    }
}

As noted in comments, you may find it's better to pick a large prime to multiply by instead. Apparently 486187739 is good... and although most examples I've seen with small numbers tend to use primes, there are at least similar algorithms where non-prime numbers are often used. In the not-quite-FNV example later, for example, I've used numbers which apparently work well - but the initial value isn't a prime. (The multiplication constant is prime though. I don't know quite how important that is.)

This is better than the common practice of XORing hashcodes for two main reasons. Suppose we have a type with two int fields:

XorHash(x, x) == XorHash(y, y) == 0 for all x, y
XorHash(x, y) == XorHash(y, x) for all x, y

By the way, the earlier algorithm is the one currently used by the C# compiler for anonymous types.

This page gives quite a few options. I think for most cases the above is "good enough" and it's incredibly easy to remember and get right. The FNV alternative is similarly simple, but uses different constants and XOR instead of ADD as a combining operation. It looks something like the code below, but the normal FNV algorithm operates on individual bytes, so this would require modifying to perform one iteration per byte, instead of per 32-bit hash value. FNV is also designed for variable lengths of data, whereas the way we're using it here is always for the same number of field values. Comments on this answer suggest that the code here doesn't actually work as well (in the sample case tested) as the addition approach above.

// Note: Not quite FNV!
public override int GetHashCode()
{
    unchecked // Overflow is fine, just wrap
    {
        int hash = (int) 2166136261;
        // Suitable nullity checks etc, of course :)
        hash = (hash * 16777619) ^ field1.GetHashCode();
        hash = (hash * 16777619) ^ field2.GetHashCode();
        hash = (hash * 16777619) ^ field3.GetHashCode();
        return hash;
    }
}

Note that one thing to be aware of is that ideally you should prevent your equality-sensitive (and thus hashcode-sensitive) state from changing after adding it to a collection that depends on the hash code.

As per the documentation:

You can override GetHashCode for immutable reference types. In general, for mutable reference types, you should override GetHashCode only if:

  • You can compute the hash code from fields that are not mutable; or
  • You can ensure that the hash code of a mutable object does not change while the object is contained in a collection that relies on its hash code.

@bitbonk 2008-11-04 21:44:52

The algorithm described in the book you mention is infact a little more detailed it especailly describes what to do for different data types of the fields. E.g.: for fields of type long use (int)(field ^ f >>> 32) instead of simply calling GetHashcode. Is long.GetHashCodes implemented that way ?

@Jon Skeet 2008-11-04 21:51:04

Yup, Int64.GetHashCode does exactly that. In Java that would require boxing, of course. That reminds me - time to add a link to the book...

@CodesInChaos 2010-11-21 22:41:24

23 is no good choice, since(as of .net 3.5 SP1) Dictionary<TKey,TValue> assumes good distribution modulo certain primes. And 23 is one of them. So if you have a dictionary with Capacity 23 only the last contribution to GetHashCode influences the compound hashcode. So I'd rather use 29 instead of 23.

@CodesInChaos 2010-11-21 22:43:56

@Ani: Your implementation allocated several new objects on the heap, so the performance might is lower than with a manual implementation. Whether that's acceptable depends or your type and usage. Check some of the other answers for helpers using generics which avoid this problem.

@Jon Skeet 2010-11-21 23:14:07

@CodeInChaos: Only the last contribution influences the bucket - so it might, at worst, have to look through all 23 entries in the dictionary. It's still going to check the actual hash code of each entry, which will be cheap. If you've got a dictionary that small, it's unlikely to matter much.

@Kumba 2011-01-18 05:05:14

@Jon: I have to ask, despite already having opened my own question on the topic, but what's a good VB version of this since VB lacks the checked and unchecked keywords? I tried making tmpHash an Int64 and AND'ing the lower 8 bits (per the accepted answer to my question), but on a sufficiently-large enough set of fields, it somehow caused the calculation to wrap to 0 for the remainder of the loop.

@Jon Skeet 2011-01-18 07:08:54

@Kumba: I don't know how I'd do this in VB, I'm afraid. Is arithmetic always checked in VB? Could you have a separate class library which you could delegate the arithmetic to, either written in C# or with checked arithmetic turned off project-wide?

@Kumba 2011-01-18 07:28:36

@Jon: VB apparently checks a lot of things. It's got a fetish for demanding that unsigned numerics get converted into signed numerics before letting you left or right shift them. Which is driving me up the wall and across the ceiling. I'm trying to implement the Jenkins hash to work around the lack of checked/unchecked (a rotating hash also does the trick, but I'm worried about hash collisions with input). Using a separate, C# library is something I'd like to avoid, because it essentially admits defeat. If I get to that point, I should just re-write the entire project in C#.

@pomeroy 2011-01-18 15:22:32

isn't the 'unchecked' unnecessary b/c CLR will happily overflow by default?

@Jon Skeet 2011-01-18 15:47:51

@pomeroy: It depends on what the project settings are. Basically you give the assembly a default context of checked or unchecked.

@Kumba 2011-01-18 20:38:32

@pomeroy: VB isn't as granular as C# is. Because it lacks the above-two keywords, your only options are to remove nteger overflows for the entire project or not. I guess if your project is complete and generally well tested, removing the overflow checks is a safe thing to do. However, while building it and debugging, those checks are good because they'll highlight bugs to fix. I opened Connect Ticket # 636564 with Microsoft to recommend including checked/unchecked keyword support in the next .NET release. Uncertain if they'll do so, though.

@Kumba 2011-01-18 20:41:14

I'll add that I'll have to go with the rotating hash algorithm linked to in Jon's answer above. It doesn't overflow, even on an Int32, doesn't (so far) wrap to 0 on a large number of fields in the calculation, and is simple and rather speedy. The Jenkins hash didn't work out...Even that overflows at random, depending on the input. Also, the forcing of bitshifts happening in signed math get in the way of a lot of things. I might open another bug on that, unless that's intended behavior somehow.

@Rory 2011-02-05 10:16:10

Don't you need override in your method declaration? Would also be good to put in null checks since this is such a well-used example.

@Jon Skeet 2011-02-05 10:23:23

@Rory: I've added the override, thanks - I'm not going to put in the null checks, as I feel that would obscure the important points. The comment suffices IMO.

@fredoverflow 2011-02-06 14:41:14

Why start with a prime instead of just zero? does int hash = 17; have any theoretically supported benefits?

@Jon Skeet 2011-02-06 16:59:00

@FredOverflow: I don't know the exact details of all the reasons behind it, but starting with 0 would mean that the hash would stay as zero if the individual field hashes were zero... and that's probably not uncommon (e.g. an integer of value zero will probably hash to zero). Just a guess, but I suspect that having a constant which gets propagated with each field is useful. This is really just copied from Effective Java though :)

@bitbonk 2011-03-15 14:04:49

@JonSkeet How collision safe would this algortihm be for an complex object graph consisting of let's say 500 objects with each having 10 properties. Related question: stackoverflow.com/questions/5308057/…

@Jon Skeet 2011-03-15 14:09:54

@bitbonk: The chances of a collision on any single change would be pretty low... but in the question you're talking about I'd probably use a cryptographic hash instead.

@bitbonk 2011-03-15 16:51:25

Then the questions stands, how do I create a cryptographic hash on an object model?

@Jon Skeet 2011-03-15 17:55:41

@bitbonk: I would strongly consider using a "normal" cryptographic hash on the result of binary serialization of form.

@yoyo 2011-12-16 18:56:29

This algorithm is basically the DJB2 string hashing algorithm, for which the constants 5381 and 33 are recommended (cse.yorku.ca/~oz/hash.html). To be honest I'm not sure the constant makes much difference, but the multiplier is important.

@KChaloux 2012-10-25 22:06:37

@JonSkeet I realize I'm raising the dead here, but implementing hashes is a new thing for me. In your implementation, what fields am I including in the hash? Only immutable ones, or are any fields good?

@Jon Skeet 2012-10-25 22:08:23

@KChaloux: That entirely depends on what you want equality to mean. Normally it's a bad idea to include mutable data though.

@Vajda 2013-01-22 16:33:10

How would you handle nullity? If simply ignoring that field, then for A = null, B = "ss" and for A = "ss", B = null we would have colisions. Is't better to multiply each field with different prime number?

@Jon Skeet 2013-01-22 16:49:04

@Vajda: I usually use 0 as the effective hash code for null - which isn't the same as ignoring the field.

@Jon Skeet 2013-11-20 14:50:45

@jnm2: I don't follow your argument, to be honest. In particular, I've just tried this effectively hashing 10 fields - and changing the value of just the first field still changed the hash, contradicting your claim that "every bit of the first hash codes will be lost".

@Jon Hanna 2014-01-14 10:23:19

You can demonstrate this gives a poor distribution quite simply. Take this variant of FNV and apply it to strings (use unsafe pointer manipulation to get integers at a time to give it a fair chance). Use it to add strings to a power-of-two based hash table. With the one I'm working on now, if I generate "1", "2", ... "999999" and add them in, it takes about 34 seconds. Now take this same hash method and re-hash the result with a well-distributed hash. With a good hash this can only make things worse (more time spent, and we could introduce new collisions but never remove them). With the...

@Jon Hanna 2014-01-14 10:31:02

... same hash table I'm working on, the same code to generate "1"..."999999" and add them takes 1 second. The effect is less pronounced with prime-number based hashes, so it that case the extra time spent re-hashing (and perhaps the reduction in possible outcomes, though that's unlikely) doesn't gain anything, but the poor performance on power-of-two tables demonstrates poor distribution generally.

@Jon Skeet 2014-01-14 11:10:09

@JonHanna: Thanks for that. Not sure what you mean by "to get integers at a time" but I'll try to have a closer look. I still like this as a first approximation for a hash, but if you have another hash which is as simple to remember and get right but with a better distribution, I'd be very happy to change my practice :)

@Jon Hanna 2014-01-14 11:24:47

I meant I used fixed(char* ptr = str){int* iPtr = (int*)ptr;... but I also tried just doing foreach(char c in str) and casting each char to int, and the same applies. The relative weakness became apparent to me when I had reason to use power-of-two tables and was getting poor results (I used to use much the same as above, myself). The solution I finally hit upon is to forget about being easy to remember, and to build a hard-to-remember method once, and then make it easy to use and put it the code at nuget.org/packages/SpookilySharp I'll add a full answer here at lunchtime.

@Jon Hanna 2014-01-14 14:33:45

@JonSkeet and now answered.

@Jon Skeet 2014-01-14 14:51:14

@JonHanna: Thanks for that. Will have to look in more detail when I have a bunch of time :)

@Dzyann 2014-02-14 16:11:19

I think is important to point out that we should be careful with the Hash code changing at run-time. We had a bug in my project because the previous developer had implemented a GetHashCode algorithm based in this answer. But in his implementation he had a list of objects, he used the hash of each item in the collection to generate the object hash code. Therefore, when the collection changed, the hash code changed also. That was causing binding problems in WPF. And if you had the object in a dictionary for example, you would get errors also.

@Jon Skeet 2014-02-14 16:59:18

@Dzyann: Yes, mutating the key in a way which affects equality - and thus the hash code - it always a bad idea. I'll add a note.

@Dzyann 2014-02-14 18:32:54

@JonSkeet you are right, and it can lead to very hard to track bugs. Like in this case with the bindings of WPF. It took ages until one of my coworkers found the cause of it and solved it. Since It wasn't our code it was very challenging.

@jnm2 2014-04-23 17:04:42

I would suggest that you change 17 and 23 to the constants here. (Thanks for the link.) It gave a simple dictionary lookup much higher performance, in my case, ~60% better.

@Jon Skeet 2014-04-23 17:07:21

@jnm2: That's not the same algorithm to start with - it's using XOR rather than ADD. I'll stick with these constants for this answer, but perhaps you should add your own answer?

@jnm2 2014-04-23 17:08:11

Actually, I was going to suggest that xoring rather than adding wouldn't diminish the simplicity as a go-to hash algorithm. What do you think?

@jnm2 2014-04-23 17:17:32

XOR ends up making GetHashCode() faster by 12% in my case.

@Jon Skeet 2014-04-23 17:38:18

@jnm2: Well it wouldn't diminish that simplicity - but it's not what I've done for the past several years. I'll add FNV as an alternative though.

@Roman Ganz 2014-04-24 12:52:15

int hash = 2166136261; Is there a cast missing? Compiler says that 2166136261 is a uint... I changed it to int hash = (int)2166136261;

@Eamon Nerbonne 2014-06-01 14:15:34

I tried to implement this approach for ValueUtils, but in my testing this variant of FNV caused considerable collisions (24%) in certain symmetrical datasets. And perhaps that's because this is NOT actually the FNV hash? Tradutional FNV hashes per octet (byte), not per 32-bit word. That gives this variant less opportunity to mix this bits around...

@Jon Skeet 2014-06-01 15:53:07

@EamonNerbonne: What do you mean by "this approach"? The answer now contains two different versions...

@Eamon Nerbonne 2014-06-01 19:27:30

I mean this variant of FNV - it's not quite FNV, and I'm pretty sure that makes matters worse. Incidentally, I also tried the h=prime; repeat h=h*prime + ? recipe; that seems to vary; it does quite well for larger primes, especially if your intermediate is 64-bit wide.

@Jon Skeet 2014-06-01 19:50:09

@Eamon: I don't know enough about the theory to comment further, I'm afraid :(

@Eamon Nerbonne 2014-06-01 19:58:07

Yeah, the theory behind it is not at all obvious to me. However - this answer suggests that this implementation is FNV, a well-known good hash. But that's not really true, since this is not FNV. Also, FNV is a string hash algorithm, which needs to satisfy much trickier requirements in that it needs to work for variable length potentially long strings. But again, the algorithm currently in the answer is not FNV - it mixes the bits much less well.

@Jon Skeet 2014-06-01 19:59:17

@EamonNerbonne: Okay. I'll edit to indicate that it's a modification, and that it doesn't work as well in at least some cases.

@jnm2 2014-06-03 12:09:00

@EamonNerbonne: What are the best coefficients that you know of?

@Eamon Nerbonne 2014-06-03 13:38:47

@jnm2 In my experiments, the offset matters little, and the trend is that larger primes perform better, with the caveat that this is all tricky to test because it's slow (really slow) to be thorough, and it depends on the way in which your dataset is "messed up". If your fields have perfectly randomly distributed hashcodes - none of this matters, but of course, in the real world, those hash codes aren't random, and fields are correlated. There's a fairly good reason why large primes would be better too - they mix bits better, especially if your data consists largely of small numbers.

@Eamon Nerbonne 2014-06-03 13:45:49

@jnm2, so I'd pick a largish prime (on the order of 2^16, say) and to tune for .NET's dictionary implementation, one that's NOT used by Dictionary<,>: referencesource.microsoft.com/#mscorlib/system/collections/…

@Eamon Nerbonne 2014-06-04 15:48:52

@jnm2 I came across these two questions further exploring this issue: stackoverflow.com/questions/1835976/… and stackoverflow.com/questions/1145217/… and both conclude: use any old large prime. The accepted answer in the first question mentions two chosen in a principled way - but it's not likely a principle that really relates to the real world, so it still recommends the basic idea: pick a large prime, NOT 23 or 31.

@Eamon Nerbonne 2014-06-04 15:52:30

BTW: note that the offset is (as far as I can tell) completely pointless. Distributive laws also hold under modulo, and that means it's simply an identical offset all objects will share - that certainly has no impact on any hashtable I I know.

@Jon Skeet 2014-06-04 16:01:00

@EamonNerbonne: I guess that's true if all the objects are of the same type. If you've got a dictionary where some keys are subclasses of other keys, that could make a difference... although only when the extra field values are 0 anyway. Again, this is mostly just habit for me :(

@Eamon Nerbonne 2014-06-04 16:04:08

@JonSkeet Yeah, if you have objects of differing type and you use differing offsets, you'll have some advantage. No reason to be prime, though, I think... In any case, an addition is so cheap, there's not much reason to avoid it either.

@Max Yankov 2014-11-10 15:35:18

I used this algorithm for a pseudo-random generator, and it behaves a little strangely: stackoverflow.com/questions/26847262/…

@Hans-Peter Störr 2015-04-01 11:35:23

If you got the number 486187739 from stackoverflow.com/a/2816747/21499 - I actually intended to recommend 92821.

@Ahmad Siavosh 2015-08-05 08:06:46

As each instance of class 'object' has a unique hash code, this idea came to my mind that it would be good if we use base.GetHashCode() as a seed or something to produce our hash code for the object.

@Jon Skeet 2015-08-05 08:13:39

@AhmadSiavosh: No, that's a bad idea, because you want distinct but equal objects to have the same hash code. (I don't think object.GetHashCode is guaranteed to be unique, either. It may well be "very unlikely to collide" but that's not the same thing.)

@Jaider 2016-02-12 16:14:02

If a fieldLis a List<obj>, it will work by just doing hash = ... ^ fieldL.GetHashCode() or I have to go through the items like foreach(){hash = ... ^ item.GetHashCode()}???

@Jon Skeet 2016-02-12 16:29:28

@Jaider: It won't do either. List<T> doesn't override Equals or GetHashCode.#

@dmarra 2016-02-16 00:58:19

I tried this code for 3 doubles, and I get an enormous amount of collisions. I need to get hashcodes for 4194304 tuples. Is there a better way? Using some larger primes helped a little bit, but I am still getting collisions.

@Jon Skeet 2016-02-16 06:07:07

@user984444: Well you should expect quite a few collisions with that many entries. Just how many are you getting?

@dmarra 2016-02-16 15:43:15

@JonSkeet It's hard to say. I am using this to cache the output of some perlin noise, and the indicator of collision is some "intersting" output in my image; It has the appearance of... when you win solitaire. This gets mitigated (and the patern changes) with larger primes. Very unhelpful I know. I have changed my struct (tuple of doubles as the key) into a class so that net takes care of the hashcode for me, and it has no collisions anymore.

@Jon Skeet 2016-02-16 15:44:24

@user984444: Um, that way equal keys won't be equal though, unless you've overridden GetHashCode in your class, in which case you've got the same problem. Maybe it would be worth you asking a new question with all the details...

@dmarra 2016-02-16 20:02:45

@JonSkeet: Not true; the default implementation of GetHashCode is working perfectly well (If it wasn't, it would be incredibly obvious in my end result). It works for a struct as well, but is WOEFULLY SLOW. I wanted to use structs, but using a class seems to work just fine for my use case.

@Jon Skeet 2016-02-16 20:43:23

@user984444: Unless you override GetHashCode and Equals yourself, or inherit from another class which does so, you will get reference equality. That's not what the struct will give you. It really, really sounds like we need a new post here with details.

@dmarra 2016-02-17 02:04:03

@JonSkeet: I consider my particular problem solved because I am getting the desired result, but if I get a chance, I'll post a question with details so you can see what is going on.

@MikeW 2016-03-30 09:32:28

being really picky, StyleCop default settings generates a warning for this code (SA1407) as you have not used parentheses to determine the precedence of the arithmetic operators, even though it is clean to any dev reading the code, and the compiler as we all know the BODMAS rule.

@Jon Skeet 2016-03-30 09:34:28

@MikeW: I don't think BODMAS includes XOR :) I think the final piece of code would be clearer with parentheses - will add them now. I agree that the multiply-and-add version doesn't need them.

@Ben 2017-11-05 17:30:29

I didn't know about "unchecked" - thanks for that!

@RobIII 2017-11-23 20:45:59

For future readers: Consider using HashCode.Combine()

@BaltoStar 2018-03-01 22:22:23

@JonSkeet any idea how to go about this in t-sql ? i need c# hash of guid series to match t-sql hash of uniqueidentifier series. but afaik in t-sql it's not possible to wrap results of integer-arithmetic.

@Jon Skeet 2018-03-02 06:57:27

@BaltoStar: I don't know anything about hashing in T-SQL. If it already provides well-defined hashing for GUID values, I'd probably try to emulate that in C# rather than the other way round.

@BaltoStar 2018-03-02 12:41:11

@JonSkeet in C# why not just MD5 hash the ordered concatenation of the GUIDs ?

@Jon Skeet 2018-03-16 07:21:15

@JamesKo: I'll add a link to HashCode.Combine when .NET Core 2.1 has actually been released, and I can link to docs. I don't think it'll be useful to many people until that time.

@James Ko 2018-03-16 22:07:59

@JonSkeet Sure.

@crush 2018-05-07 14:08:49

I'm unsure how to handle the nulls here. None of the answers really seem to touch that topic, just assuming we are all experts on the topic. @JonSkeet Mentions in these comments "I usually use 0 as the effective hash code for null - which isn't the same as ignoring the field." How that is actually implemented though, I have questions about. It sounds like you are saying that a null property should zero out the current hash value, but that seems like odd behavior. It might be obvious to some what to do, but I'd appreciate an example showing how to handle the nulls, or a better explanation.

@crush 2018-05-07 14:18:02

After reading a few other Q&A's on the topic, I realized that I didn't comprehend what @JonSkeet was saying very well. I misunderstood him to be saying that I should substitute 0 as the hashing constant when the property is null. After seeing an example here, I realize he was merely stating that I should substitute 0 as the property's hash code, which seems so obvious now...considering that is exactly what he said.

@stt106 2018-05-14 17:03:01

Does it really need to use primes like 17 or 23 if my object's hash only depends on one int32 property? Can I just return MyProperty.GetHashCode()?

@Jon Skeet 2018-05-14 17:04:23

@stt106: For just a single property, I'd just return that property's hash code, yes.

@cactuaroid 2018-10-27 13:59:50

FYI, Visual Studio 2017 can generate GetHashCode() without ReSharper. docs.microsoft.com/en-us/visualstudio/ide/reference/…

@Rick Love 2011-01-07 21:38:29

Anonymous Type

Microsoft already provides a good generic HashCode generator: Just copy your property/field values to an anonymous type and hash it:

new { PropA, PropB, PropC, PropD }.GetHashCode();

This will work for any number of properties. It does not use boxing. It just uses the algorithm already implemented in the framework for anonymous types.

ValueTuple - Update for C# 7

As @cactuaroid mentions in the comments, a value tuple can be used. This saves a few keystrokes and more importantly executes purely on the stack (no Garbage):

(PropA, PropB, PropC, PropD).GetHashCode();

(Note: The original technique using anonymous types seems to create an object on the heap, i.e. garbage, since anonymous types are implemented as classes, though this might be optimized out by the compiler. It would be interesting to benchmark these options, but the tuple option should be superior.)

@digEmAll 2011-01-08 09:50:49

Yes, anonymous GetHashCode implementation is very effective (BTW it's the same as the one in the Jon Skeet's answer), but the only problem with this solution is that you generate a new instance at any GetHashCode call. It can be a bit overhead-ish in particular in case of intensive access to big hashed collections...

@Kumba 2011-01-11 09:37:10

This works in VB w/ .NET 4.0, but looking over the IL, it's using box calls since the type uses generics. No unboxing, but from my reading here, the mere presence of boxing suggests this can be a little inefficient. Seems like the only choice for VB, though, since there is not equivalent of checked/`unchecked'.

@Rick Love 2011-04-02 17:30:54

@digEmAll Good point, I didn't think about the overhead of creating an new object. Jon Skeet's answer is the most efficient and won't use boxing. (@Kumba To solve the unchecked in VB, just use a Int64 (long) and truncate it after the calculations.)

@sehe 2011-04-22 19:51:59

could just say new { PropA, PropB, PropC, PropD }.GetHashCode() too

@mwolfe02 2013-04-16 15:40:11

In VB.Net: New With {PropA, PropB, PropC, PropD}.GetHashCode()

@David Osborne 2014-08-20 15:58:52

VB.NET must use Key in anonymous type creation: New With {Key PropA}.GetHashCode() Otherwise GetHashCode will not return the same hashcode for different objects with the same 'identifying' properties.

@Keith 2015-10-19 20:16:00

Don't forget to enumerate your IEnumerables or bad things will happen. new { PropA, PropB, C = PropC.ToList() }.GetHashCode()

@Rick Love 2015-10-20 20:40:53

@Keith in that case, I would consider saving the IEnumerable as a list value somewhere instead of enumerating it each time the hashcode is calculated. Caclulating ToList each time inside GetHashCode could hurt performance in many situations.

@shA.t 2017-08-29 08:22:07

And don't forget that private property/fields are not needed in this case ;).

@ToolmakerSteve 2018-03-01 04:15:58

@Keith: A hash code does not have to be affected by all properties of an object. A hash code merely needs to give good enough distribution of your objects. And should be fast to compute. Leave out the enumerable. And if you have a list, don't include the whole list. Use Count, and perhaps first element (use zero if no elements). unless your class doesn't have much variation other than the list; in that case, caching a hash of the list, as Rick suggests, is best. Recall that, by definition, an object's hash must always be the same. If collection changes, DO NOT include it in hash calc.

@cactuaroid 2018-08-16 11:59:24

For those who like this, (PropA, PropB, PropC, PropD).GetHashCode() is now available on C#7 without GC pressure @digEmAll concerns. Quick and Simple Hash Code Combinations

@Rick Love 2018-08-16 13:55:27

@cactuaroid Nice! So using a tuple (which is a struct) instead of the anonymous type (a class). Does it still use the same calculation behind the scenes for the Tuple GetHashcode()?

@cactuaroid 2018-08-16 15:03:13

@RickLove I'm not sure the mathematics. Tuple.GetHashCode() and ValueTuple.GetHashCode() looks similar. ValueTuple.GetHashCode() calls HashHelper. Tuple.GetHashCode() calls Tuple.CombineHashCodes. For anonymous type, How are Equals and GetHashCode implemented on anonymous types?

@digEmAll 2018-08-16 16:43:51

@cactuaroid: indeed this is a great solution !

@cactuaroid 2018-08-17 14:08:21

I apologize that @Timo has already posted about ValueTuple.GetHashCode() below.

@Timo 2018-05-15 12:00:46

If we have no more than 8 properties (hopefully), here is another alternative.

ValueTuple is a struct and appears to have a solid GetHashCode implementation.

That means we could simply do this:

// Yay, no allocations and no custom implementations!
public override int GetHashCode() => (this.PropA, this.PropB).GetHashCode();

Let's take a look at .NET Core's current implementation for ValueTuple's GetHashCode.

This is from ValueTuple:

    internal static int CombineHashCodes(int h1, int h2)
    {
        return HashHelpers.Combine(HashHelpers.Combine(HashHelpers.RandomSeed, h1), h2);
    }

    internal static int CombineHashCodes(int h1, int h2, int h3)
    {
        return HashHelpers.Combine(CombineHashCodes(h1, h2), h3);
    }

And this is from HashHelper:

    public static readonly int RandomSeed = Guid.NewGuid().GetHashCode();

    public static int Combine(int h1, int h2)
    {
        unchecked
        {
            // RyuJIT optimizes this to use the ROL instruction
            // Related GitHub pull request: dotnet/coreclr#1830
            uint rol5 = ((uint)h1 << 5) | ((uint)h1 >> 27);
            return ((int)rol5 + h1) ^ h2;
        }
    }

In English:

  • Left rotate (circular shift) h1 by 5 positions.
  • Add the result and h1 together.
  • XOR the result with h2.
  • Start by performing the above operation on { static random seed, h1 }.
  • For each further item, perform the operation on the previous result and the next item (e.g. h2).

It would be nice to know more about the properties of this ROL-5 hash code algorithm.

Regrettably, deferring to ValueTuple for our own GetHashCode may not be as fast as we would like and expect. This comment in a related discussion illustrates that directly calling HashHelpers.Combine is more performant. On the flip side, that one is internal, so we'd have to copy the code, sacrificing much of what we had gained here. Also, we'd be responsible for remembering to first Combine with the random seed. I don't know what the consequences are if we skip that step.

@cactuaroid 2018-08-17 14:28:16

Assuming h1 >> 27 is 0 to ignore it, h1 << 5 equals h1 * 32 therefore it is same as h1 * 33 ^ h2. According to this page, it is called "Modified Bernstein".

@Şafak Gür 2013-09-04 12:32:48

Here's my helper class using Jon Skeet's implementation.

public static class HashCode
{
    public const int Start = 17;

    public static int Hash<T>(this int hash, T obj)
    {
        var h = EqualityComparer<T>.Default.GetHashCode(obj);
        return unchecked((hash * 31) + h);
    }
}

Usage:

public override int GetHashCode()
{
    return HashCode.Start
        .Hash(_field1)
        .Hash(_field2)
        .Hash(_field3);
}

If you want to avoid writing an extension method for System.Int32:

public struct HashCode
{
    private readonly int _value;

    public HashCode(int value) => _value = value;

    public static HashCode Start { get; } = new HashCode(17);

    public static implicit operator int(HashCode hash) => hash._value;

    public HashCode Hash<T>(T obj)
    {
        var h = EqualityComparer<T>.Default.GetHashCode(obj);
        return unchecked(new HashCode((_value * 31) + h));
    }

    public override int GetHashCode() => _value;
}

It's still generic, it still avoids any heap allocation and it's used exactly the same way:

public override int GetHashCode()
{
    // This time `HashCode.Start` is not an `Int32`, it's a `HashCode` instance.
    // And the result is implicitly converted to `Int32`.
    return HashCode.Start
        .Hash(_field1)
        .Hash(_field2)     
        .Hash(_field3);
}

Update after Martin's comment:

obj != null caused boxing so I switched to the default comparer.

  • See this answer regarding the default comparer's performance.
  • See this question for a discussion about the hash codes of null values.

Edit (May 2018):

EqualityComparer<T>.Default getter is now a JIT intrinsic - the pull request is mentioned by Stephen Toub in this blog post.

@Bill Barry 2014-09-05 17:12:59

I would change the line with the tertiary operator to be: var h = Equals(obj, default(T)) ? 0 : obj.GetHashCode();

@Martin Liversage 2014-09-13 23:00:02

I believe that the ternary operator with obj != null will compile to a box instruction which will allocate memory if T is a value type. Instead you can use obj.Equals(null) which will compile to a virtual call of the Equals method.

@Şafak Gür 2015-06-15 08:01:08

Because this.hashCode != h. It wouldn't return the same value.

@Erik Karlsson 2015-06-15 08:28:13

Sorry, manage to remove my comment instead of editing it. Is it more beneficial to create a new struct then change the hashCode to non-readonly and do: "unchecked { this.hashCode ^= h * 397; } return this;" for example?

@Şafak Gür 2015-06-15 10:35:50

Immutability has its benefits (Why are mutable structs evil?). About performance, what I do is pretty cheap since it does not allocate any space in the heap.

@user764754 2016-04-11 18:12:04

There is no boxing if you call it like Hash(1) and not like Hash<MyInterface>(myStruct). stackoverflow.com/questions/8823239

@deadManN 2012-11-30 19:35:52

Microsoft lead for several way of hashing...

//for classes that contain a single int value
return this.value;

//for classes that contain multiple int value
return x ^ y;

//for classes that contain single number bigger than int    
return ((int)value ^ (int)(value >> 32)); 

//for classes that contain class instance fields which inherit from object
return obj1.GetHashCode();

//for classes that contain multiple class instance fields which inherit from object
return obj1.GetHashCode() ^ obj2.GetHashCode() ^ obj3.GetHashCode(); 

I can guess that for multiple big int you can use this:

int a=((int)value1 ^ (int)(value1 >> 32));
int b=((int)value2 ^ (int)(value2 >> 32));
int c=((int)value3 ^ (int)(value3 >> 32));
return a ^ b ^ c;

And same for multi-type: all converted first to int using GetHashCode() then the int values will be xor'ed and the result is your hash.

For those who use hash as ID (I mean an unique value), hash is naturally limited to a number of digits, I think it was 5 bytes for hashing algorithm, at least MD5.

You may turn multiple values to a hashed value and some of them be same, so don't use it as an identifier. (maybe some day I am going to use your component)

@Jon Hanna 2014-01-14 09:36:05

Xoring integers to make a hashcode is a well-known antipattern that tends to result in a particularly high number of collisions with real-world values.

@deadManN 2015-09-16 05:59:14

Every one here use integer, and there been never any kind of guarantee for hash to be same, it just tried to be as much vary as there are few collisions to happen.

@Jon Hanna 2015-09-16 08:44:13

Yes, but your second and fifth don't try to avoid collisions.

@deadManN 2015-09-19 12:37:41

Not sure which thread it was... but it did the same, msdn.microsoft.com/en-us/library/…

@Jon Hanna 2015-09-19 14:06:33

Yes, that antipattern is quite common.

@deadManN 2015-09-20 13:34:10

that's why i relay on that, but thanks for lighten things up... the other thing is does the other patter has less calculation time? you know, kind of, collision vs calculation time matter is also there

@Jon Hanna 2015-09-20 16:57:27

There's a balance to reach. Use a really good hash code like Spookyhash and you'll get much, much better collision avoidance but it will have much more calculation time than any of these (but when it comes to hashing very large amounts of data, Spookyhash is extremely quick). A simple shift on one of the values before xoring is only marginal extra cost for a good reduction in collision. Prime-number multiplication increasing both time and quality again. Which is better between shift or mult is hence debatable. Plain xor though very often has a lot of collisions on real data and is best avoided

@Charles Burns 2016-09-01 19:19:17

ReSharper users can generate GetHashCode, Equals, and others with ReSharper -> Edit -> Generate Code -> Equality Members.

// ReSharper's GetHashCode looks like this
public override int GetHashCode() {
    unchecked {
        int hashCode = Id;
        hashCode = (hashCode * 397) ^ IntMember;
        hashCode = (hashCode * 397) ^ OtherIntMember;
        hashCode = (hashCode * 397) ^ (RefMember != null ? RefMember.GetHashCode() : 0);
        // ...
        return hashCode;
    }
}

@Jon Hanna 2014-01-14 14:15:33

Up until recently my answer would have been very close to Jon Skeet's here. However, I recently started a project which used power-of-two hash tables, that is hash tables where the size of the internal table is 8, 16, 32, etc. There's a good reason for favouring prime-number sizes, but there are some advantages to power-of-two sizes too.

And it pretty much sucked. So after a bit of experimentation and research I started re-hashing my hashes with the following:

public static int ReHash(int source)
{
  unchecked
  {
    ulong c = 0xDEADBEEFDEADBEEF + (ulong)source;
    ulong d = 0xE2ADBEEFDEADBEEF ^ c;
    ulong a = d += c = c << 15 | c >> -15;
    ulong b = a += d = d << 52 | d >> -52;
    c ^= b += a = a << 26 | a >> -26;
    d ^= c += b = b << 51 | b >> -51;
    a ^= d += c = c << 28 | c >> -28;
    b ^= a += d = d << 9 | d >> -9;
    c ^= b += a = a << 47 | a >> -47;
    d ^= c += b << 54 | b >> -54;
    a ^= d += c << 32 | c >> 32;
    a += d << 25 | d >> -25;
    return (int)(a >> 1);
  }
}

And then my power-of-two hash table didn't suck any more.

This disturbed me though, because the above shouldn't work. Or more precisely, it shouldn't work unless the original GetHashCode() was poor in a very particular way.

Re-mixing a hashcode can't improve a great hashcode, because the only possible effect is that we introduce a few more collisions.

Re-mixing a hash code can't improve a terrible hash code, because the only possible effect is we change e.g. a large number of collisions on value 53 to a large number of value 18,3487,291.

Re-mixing a hash code can only improve a hash code that did at least fairly well in avoiding absolute collisions throughout its range (232 possible values) but badly at avoiding collisions when modulo'd down for actual use in a hash table. While the simpler modulo of a power-of-two table made this more apparent, it was also having a negative effect with the more common prime-number tables, that just wasn't as obvious (the extra work in rehashing would outweigh the benefit, but the benefit would still be there).

Edit: I was also using open-addressing, which would also have increased the sensitivity to collision, perhaps more so than the fact it was power-of-two.

And well, it was disturbing how much the string.GetHashCode() implementations in .NET (or study here) could be improved this way (on the order of tests running about 20-30 times faster due to fewer collisions) and more disturbing how much my own hash codes could be improved (much more than that).

All the GetHashCode() implementations I'd coded in the past, and indeed used as the basis of answers on this site, were much worse than I'd throught. Much of the time it was "good enough" for much of the uses, but I wanted something better.

So I put that project to one side (it was a pet project anyway) and started looking at how to produce a good, well-distributed hash code in .NET quickly.

In the end I settled on porting SpookyHash to .NET. Indeed the code above is a fast-path version of using SpookyHash to produce a 32-bit output from a 32-bit input.

Now, SpookyHash is not a nice quick to remember piece of code. My port of it is even less so because I hand-inlined a lot of it for better speed*. But that's what code reuse is for.

Then I put that project to one side, because just as the original project had produced the question of how to produce a better hash code, so that project produced the question of how to produce a better .NET memcpy.

Then I came back, and produced a lot of overloads to easily feed just about all of the native types (except decimal†) into a hash code.

It's fast, for which Bob Jenkins deserves most of the credit because his original code I ported from is faster still, especially on 64-bit machines which the algorithm is optimised for‡.

The full code can be seen at https://bitbucket.org/JonHanna/spookilysharp/src but consider that the code above is a simplified version of it.

However, since it's now already written, one can make use of it more easily:

public override int GetHashCode()
{
  var hash = new SpookyHash();
  hash.Update(field1);
  hash.Update(field2);
  hash.Update(field3);
  return hash.Final().GetHashCode();
}

It also takes seed values, so if you need to deal with untrusted input and want to protect against Hash DoS attacks you can set a seed based on uptime or similar, and make the results unpredictable by attackers:

private static long hashSeed0 = Environment.TickCount;
private static long hashSeed1 = DateTime.Now.Ticks;
public override int GetHashCode()
{
  //produce different hashes ever time this application is restarted
  //but remain consistent in each run, so attackers have a harder time
  //DoSing the hash tables.
  var hash = new SpookyHash(hashSeed0, hashSeed1);
  hash.Update(field1);
  hash.Update(field2);
  hash.Update(field3);
  return hash.Final().GetHashCode();
}

*A big surprise in this is that hand-inlining a rotation method that returned (x << n) | (x >> -n) improved things. I would have been sure that the jitter would have inlined that for me, but profiling showed otherwise.

decimal isn't native from the .NET perspective though it is from the C#. The problem with it is that its own GetHashCode() treats precision as significant while its own Equals() does not. Both are valid choices, but not mixed like that. In implementing your own version, you need to choose to do one, or the other, but I can't know which you'd want.

‡By way of comparison. If used on a string, the SpookyHash on 64 bits is considerably faster than string.GetHashCode() on 32 bits which is slightly faster than string.GetHashCode() on 64 bits, which is considerably faster than SpookyHash on 32 bits, though still fast enough to be a reasonable choice.

@supercat 2014-04-24 21:31:31

When combining multiple hash values into one, I tend to use long values for the intermediate results, and then munge the final result down to an int. Does that seem like a good idea? My concern is that one uses e.g. hash=(hash*31)+nextField, then pairs of matching values will only affect the upper 27 bits of the hash. Letting the calculation extend to a long and wrapping stuff in would minimize that danger.

@Jon Hanna 2014-04-24 22:48:41

@supercat it depends on the distribution of your final munging. The SpookilySharp library would ensure that the distribution was good, ideally (because it won't need object creation) by passing a pointer to a blittable type, or passing one of the enumerables it handles directly, but if you don't already have blittable data or a suitable enumeration, then calling .Update() with the multiple values as per the answer above will do the trick.

@Eamon Nerbonne 2014-06-01 14:19:24

@JonHanna would you be willing to be more precise with the problematic behavior you encountered? I'm trying to implement a library that makes implementing value objects trivial (ValueUtils) and I'd love a testset demonstrating poor hash miscibility in power-of-two hashtables.

@Jon Hanna 2014-06-02 14:01:09

@EamonNerbonne I don't really have anything more precise than "overall time was slower that way". As I added in an edit, the fact that I was using open-addressing may have been more important than the power-of-two factor. I do plan to do some test cases on a particular project where I'll be comparing a few different approaches, so I may have a better answer for you after that, though that's not a high-priority (a personal project with no pressing need, so I'll get to it when I get to it...)

@Eamon Nerbonne 2014-06-02 15:23:39

@JonHanna: yeah I know how the personal project schedule goes - good luck! In any case, I see I didn't phrase that last comment well: I meant to ask for the problematic input, and not necessarily the details of the problems that resulted. I'd love to use that as a test set (or inspiration for a test set). In any case - good luck with your pet project :-).

@maaartinus 2017-11-11 22:01:12

I'd bet that your ReHash is a big overkill. I guess, it works well, but it may be even slower than cryptographic hashed which are (sort of) proven to work perfectly. Java also uses power-of-two sized tables and there used to be a rather complicated re-hashing. It's got simplified since tree nodes for collisions were introduced.

@Jon Hanna 2017-11-11 22:56:39

@maaartinus in terms of speed and distribution, it's well demonstrated. I'm now of the opinion that it's more trouble than it's worth for small values. I'd still use the fuller implementation of SpookyHash when it comes to hashing very large values like file contents.

@James Ko 2017-11-23 15:06:05

As of https://github.com/dotnet/coreclr/pull/14863, there is a new way to generate hash codes that is super simple! Just write

public override int GetHashCode()
    => HashCode.Combine(field1, field2, field3);

This will generate a quality hash code without you having to worry about the implementation details.

@Dan J 2017-12-14 00:37:50

That looks like a sweet addition... any way to know what version of .NET Core that'll ship in?

@James Ko 2017-12-14 00:41:08

@DanJ What a happy coincidence, the HashCode changes for corefx were merged just a couple of hours before your comment :) The type is slated to ship in .NET Core 2.1.

@Dan J 2017-12-14 00:48:03

That is awesome - and quite the turnaround time. Upvoted. :)

@James Ko 2017-12-15 23:44:28

@DanJ Even better news-- it should be available right now on the nightly builds of CoreFX hosted on the dotnet-core MyGet feed.

@Dan J 2017-12-17 22:18:18

Sweet - that doesn't help me at work, since we're not quite that bleeding-edge, but good to know. Cheers!

@ogggre 2019-03-30 12:49:49

Here is a drop-in polyfill package that you can use for everything .NET 4.0+ (System.HashCode included): nuget.org/packages/Gapotchenko.FX

@Wahid Shalaly 2009-02-23 11:46:55

I have a Hashing class in Helper library that I use it for this purpose.

/// <summary> 
/// This is a simple hashing function from Robert Sedgwicks Hashing in C book.
/// Also, some simple optimizations to the algorithm in order to speed up
/// its hashing process have been added. from: www.partow.net
/// </summary>
/// <param name="input">array of objects, parameters combination that you need
/// to get a unique hash code for them</param>
/// <returns>Hash code</returns>
public static int RSHash(params object[] input)
{
    const int b = 378551;
    int a = 63689;
    int hash = 0;

    // If it overflows then just wrap around
    unchecked
    {
        for (int i = 0; i < input.Length; i++)
        {
            if (input[i] != null)
            {
                hash = hash * a + input[i].GetHashCode();
                a = a * b;
            }
        }
    }

    return hash;
}

Then, simply you can use it as:

public override int GetHashCode()
{
    return Hashing.RSHash(_field1, _field2, _field3);
}

I didn't assess its performance, so any feedback is welcomed.

@nightcoder 2010-04-04 15:39:24

Well, it will cause boxing, if fields are value types.

@Tim Schmelter 2014-02-24 13:06:41

"can be enhanced later by catching the OverflowException" The whole point of the unchecked is to avoid exceptions on overflow which is desired on GetHashCode. So it's not incorrect if the value overflows int and it does not hurt at all.

@Nathan Adams 2015-04-17 12:12:41

One issue with this algorithm is that any array full of nulls will always return 0, regardless of it's length

@James Newton-King 2016-07-20 12:35:47

This helper method also allocates a new object[]

@David Schwartz 2016-10-28 19:04:03

As @NathanAdams mentions, the fact that null is skipped entirely could give you unexpected results. Instead of skipping them, you should just use some constant value instead of input[i].GetHashCode() when input[i] is null.

@Dbl 2014-10-21 17:49:34

Pretty much similar to nightcoder's solution except it's easier to raise primes if you want to.

PS: This is one of those times where you puke a little in your mouth, knowing that this could be refactored into one method with 9 default's but it would be slower, so you just close your eyes and try to forget about it.

/// <summary>
/// Try not to look at the source code. It works. Just rely on it.
/// </summary>
public static class HashHelper
{
    private const int PrimeOne = 17;
    private const int PrimeTwo = 23;

    public static int GetHashCode<T1, T2, T3, T4, T5, T6, T7, T8, T9, T10>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();
            hash = hash * PrimeTwo + arg7.GetHashCode();
            hash = hash * PrimeTwo + arg8.GetHashCode();
            hash = hash * PrimeTwo + arg9.GetHashCode();
            hash = hash * PrimeTwo + arg10.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5, T6, T7, T8, T9>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();
            hash = hash * PrimeTwo + arg7.GetHashCode();
            hash = hash * PrimeTwo + arg8.GetHashCode();
            hash = hash * PrimeTwo + arg9.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5, T6, T7, T8>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();
            hash = hash * PrimeTwo + arg7.GetHashCode();
            hash = hash * PrimeTwo + arg8.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5, T6, T7>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();
            hash = hash * PrimeTwo + arg7.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5, T6>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4>(T1 arg1, T2 arg2, T3 arg3, T4 arg4)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3>(T1 arg1, T2 arg2, T3 arg3)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2>(T1 arg1, T2 arg2)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();

            return hash;
        }
    }
}

@JJS 2016-12-27 17:09:28

Doesn't handle nulls.

@HokieMike 2014-09-28 16:44:25

I ran into an issue with floats and decimals using the implementation selected as the answer above.

This test fails (floats; hash is the same even though I switched 2 values to be negative):

        var obj1 = new { A = 100m, B = 100m, C = 100m, D = 100m};
        var obj2 = new { A = 100m, B = 100m, C = -100m, D = -100m};
        var hash1 = ComputeHash(obj1.A, obj1.B, obj1.C, obj1.D);
        var hash2 = ComputeHash(obj2.A, obj2.B, obj2.C, obj2.D);
        Assert.IsFalse(hash1 == hash2, string.Format("Hashcode values should be different   hash1:{0}  hash2:{1}",hash1,hash2));

But this test passes (with ints):

        var obj1 = new { A = 100m, B = 100m, C = 100, D = 100};
        var obj2 = new { A = 100m, B = 100m, C = -100, D = -100};
        var hash1 = ComputeHash(obj1.A, obj1.B, obj1.C, obj1.D);
        var hash2 = ComputeHash(obj2.A, obj2.B, obj2.C, obj2.D);
        Assert.IsFalse(hash1 == hash2, string.Format("Hashcode values should be different   hash1:{0}  hash2:{1}",hash1,hash2));

I changed my implementation to not use GetHashCode for the primitive types and it seems to work better

    private static int InternalComputeHash(params object[] obj)
    {
        unchecked
        {
            var result = (int)SEED_VALUE_PRIME;
            for (uint i = 0; i < obj.Length; i++)
            {
                var currval = result;
                var nextval = DetermineNextValue(obj[i]);
                result = (result * MULTIPLIER_VALUE_PRIME) + nextval;

            }
            return result;
        }
    }



    private static int DetermineNextValue(object value)
    {
        unchecked
        {

                int hashCode;
                if (value is short
                    || value is int
                    || value is byte
                    || value is sbyte
                    || value is uint
                    || value is ushort
                    || value is ulong
                    || value is long
                    || value is float
                    || value is double
                    || value is decimal)
                {
                    return Convert.ToInt32(value);
                }
                else
                {
                    return value != null ? value.GetHashCode() : 0;
                }
        }
    }

@Mark Hurd 2014-09-30 04:28:04

In case you intended otherwise unchecked does NOT affect Convert.ToInt32: uint, long, float, double and decimal can all overflow here.

@Scott Wegner 2014-01-20 23:41:33

Here is another fluent implementation of the algorithm posted above by Jon Skeet, but which includes no allocations or boxing operations:

public static class Hash
{
    public const int Base = 17;

    public static int HashObject(this int hash, object obj)
    {
        unchecked { return hash * 23 + (obj == null ? 0 : obj.GetHashCode()); }
    }

    public static int HashValue<T>(this int hash, T value)
        where T : struct
    {
        unchecked { return hash * 23 + value.GetHashCode(); }
    }
}

Usage:

public class MyType<T>
{
    public string Name { get; set; }

    public string Description { get; set; }

    public int Value { get; set; }

    public IEnumerable<T> Children { get; set; }

    public override int GetHashCode()
    {
        return Hash.Base
            .HashObject(this.Name)
            .HashObject(this.Description)
            .HashValue(this.Value)
            .HashObject(this.Children);
    }
}

The compiler will ensure HashValue is not called with a class due to the generic type constraint. But there is no compiler support for HashObject since adding a generic argument also adds a boxing operation.

@bitbonk 2011-03-22 12:15:48

Here is my simplistic approach. I am using the classic builder pattern for this. It is typesafe (no boxing/unboxing) and also compatbile with .NET 2.0 (no extension methods etc.).

It is used like this:

public override int GetHashCode()
{
    HashBuilder b = new HashBuilder();
    b.AddItems(this.member1, this.member2, this.member3);
    return b.Result;
} 

And here is the acutal builder class:

internal class HashBuilder
{
    private const int Prime1 = 17;
    private const int Prime2 = 23;
    private int result = Prime1;

    public HashBuilder()
    {
    }

    public HashBuilder(int startHash)
    {
        this.result = startHash;
    }

    public int Result
    {
        get
        {
            return this.result;
        }
    }

    public void AddItem<T>(T item)
    {
        unchecked
        {
            this.result = this.result * Prime2 + item.GetHashCode();
        }
    }

    public void AddItems<T1, T2>(T1 item1, T2 item2)
    {
        this.AddItem(item1);
        this.AddItem(item2);
    }

    public void AddItems<T1, T2, T3>(T1 item1, T2 item2, T3 item3)
    {
        this.AddItem(item1);
        this.AddItem(item2);
        this.AddItem(item3);
    }

    public void AddItems<T1, T2, T3, T4>(T1 item1, T2 item2, T3 item3, 
        T4 item4)
    {
        this.AddItem(item1);
        this.AddItem(item2);
        this.AddItem(item3);
        this.AddItem(item4);
    }

    public void AddItems<T1, T2, T3, T4, T5>(T1 item1, T2 item2, T3 item3, 
        T4 item4, T5 item5)
    {
        this.AddItem(item1);
        this.AddItem(item2);
        this.AddItem(item3);
        this.AddItem(item4);
        this.AddItem(item5);
    }        

    public void AddItems<T>(params T[] items)
    {
        foreach (T item in items)
        {
            this.AddItem(item);
        }
    }
}

@nawfal 2013-04-14 12:52:27

you can avoid the object creation inside gethashcode function as in Mangus's answer. Just call the damn static hash functions (who cares about starter hash). Also, you could use AddItems<T>(params T[] items) method more often in the helper class (than calling AddItem(T) each time).

@nawfal 2013-04-14 12:54:32

And what benefit do you find doing this.result * Prime2 * item.GetHashCode() when often used is this.result * Prime2 + item.GetHashCode()?

@bitbonk 2013-04-15 06:18:54

That was a typo, thanks!

@bitbonk 2013-04-15 06:25:24

I can't use AddItems<T>(params T[] items) more often because typeof(T1) != typeof(T2) etc.

@nawfal 2013-04-15 06:54:16

oh yes I missed that.

@Magnus 2010-10-07 10:51:06

This is a good one:

/// <summary>
/// Helper class for generating hash codes suitable 
/// for use in hashing algorithms and data structures like a hash table. 
/// </summary>
public static class HashCodeHelper
{
    private static int GetHashCodeInternal(int key1, int key2)
    {
        unchecked
        {
           var num = 0x7e53a269;
           num = (-1521134295 * num) + key1;
           num += (num << 10);
           num ^= (num >> 6);

           num = ((-1521134295 * num) + key2);
           num += (num << 10);
           num ^= (num >> 6);

           return num;
        }
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="arr">An array of objects used for generating the 
    /// hash code.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data 
    /// structures like a hash table. 
    /// </returns>
    public static int GetHashCode(params object[] arr)
    {
        int hash = 0;
        foreach (var item in arr)
            hash = GetHashCodeInternal(hash, item.GetHashCode());
        return hash;
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="obj1">The first object.</param>
    /// <param name="obj2">The second object.</param>
    /// <param name="obj3">The third object.</param>
    /// <param name="obj4">The fourth object.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and
    /// data structures like a hash table.
    /// </returns>
    public static int GetHashCode<T1, T2, T3, T4>(T1 obj1, T2 obj2, T3 obj3,
        T4 obj4)
    {
        return GetHashCode(obj1, GetHashCode(obj2, obj3, obj4));
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="obj1">The first object.</param>
    /// <param name="obj2">The second object.</param>
    /// <param name="obj3">The third object.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data 
    /// structures like a hash table. 
    /// </returns>
    public static int GetHashCode<T1, T2, T3>(T1 obj1, T2 obj2, T3 obj3)
    {
        return GetHashCode(obj1, GetHashCode(obj2, obj3));
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="obj1">The first object.</param>
    /// <param name="obj2">The second object.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data 
    /// structures like a hash table. 
    /// </returns>
    public static int GetHashCode<T1, T2>(T1 obj1, T2 obj2)
    {
        return GetHashCodeInternal(obj1.GetHashCode(), obj2.GetHashCode());
    }
}

And here is how to use it:

private struct Key
{
    private Type _type;
    private string _field;

    public Type Type { get { return _type; } }
    public string Field { get { return _field; } }

    public Key(Type type, string field)
    {
        _type = type;
        _field = field;
    }

    public override int GetHashCode()
    {
        return HashCodeHelper.GetHashCode(_field, _type);
    }

    public override bool Equals(object obj)
    {
        if (!(obj is Key))
            return false;
        var tf = (Key)obj;
        return tf._field.Equals(_field) && tf._type.Equals(_type);
    }
}

@David Rutten 2010-10-07 15:43:01

@Magnus, can you explain why it's a good one?

@Michael Stum 2010-10-07 17:28:19

How are the Keys determined? GetHashCode() doesn't take any parameters, so it needs to call this one with two Keys that need to be determined somehow. Sorry, without further explanation this only looks clever, but not that good.

@Magnus 2010-10-08 07:06:30

It't hash code generation part of my hash code helper class.

@gehho 2010-10-08 09:31:53

And why do you need the generic overloads? The type is not important (and not used in your code) since all objects have a GetHashCode() method, so you can always use the method with the params array parameter. Or am I missing something here?

@Magnus 2010-10-08 09:57:17

It's about performance, avoid the loop for smaller <= 4 fields. But I guess the generics could be skipped and just use object instead.

@CodesInChaos 2010-11-21 22:26:26

When you'd use object instead of generics you'd get boxing and memory allocations, which you don't want in GetHashCode. So generics are the way to go.

@sehe 2011-04-22 19:54:08

The trailing shift/xor steps (h += (h << 10); h ^= (h >> 6); h += (h << 3); h ^= (h >> 11); h += (h << 15); have a codesmell: they do not depend on any of the input and look awfully redundant to me.

@Magnus 2012-12-24 11:29:27

@nawfal what speed considerations do you have?

@nawfal 2012-12-25 10:41:15

@Magnus nothing in particular other than the general rule that hashing must be fast. This can't be as fast as I would love it to be. But as I said this gives better distribution of values which may be suitable for some cases.

@Magnus 2012-12-25 10:59:44

@nawfal Running this 100 million times takes about 390 ms. Running the solution suggested by Jon Skeet 100 million times takes about 320 ms so it is not a huge difference.

@nawfal 2012-12-25 11:28:38

@Magnus yes right, I'll delete my original comment. Just a little note that this may not be as fast as some other solutions here, but as you say shouldn't matter. The distribution is great, better than most solutions here, so +1 from me! :)

@ToolmakerSteve 2018-03-01 01:42:00

How does this compare in quality (distribution) and performance to simply using a long intermediate, with multiplication of each input by a large prime? E.g. for two values, something like this one liner: return ((long)v1 * 805306457 + (long)v2 * 189783887).GetHashCode(); [The primes are chosen to avoid numeric overflow of long, in a checked environment, and to tend to set different bits.]

@nightcoder 2010-04-04 18:26:08

Here is my hashcode helper.
It's advantage is that it uses generic type arguments and therefore will not cause boxing:

public static class HashHelper
{
    public static int GetHashCode<T1, T2>(T1 arg1, T2 arg2)
    {
         unchecked
         {
             return 31 * arg1.GetHashCode() + arg2.GetHashCode();
         }
    }

    public static int GetHashCode<T1, T2, T3>(T1 arg1, T2 arg2, T3 arg3)
    {
        unchecked
        {
            int hash = arg1.GetHashCode();
            hash = 31 * hash + arg2.GetHashCode();
            return 31 * hash + arg3.GetHashCode();
        }
    }

    public static int GetHashCode<T1, T2, T3, T4>(T1 arg1, T2 arg2, T3 arg3, 
        T4 arg4)
    {
        unchecked
        {
            int hash = arg1.GetHashCode();
            hash = 31 * hash + arg2.GetHashCode();
            hash = 31 * hash + arg3.GetHashCode();
            return 31 * hash + arg4.GetHashCode();
        }
    }

    public static int GetHashCode<T>(T[] list)
    {
        unchecked
        {
            int hash = 0;
            foreach (var item in list)
            {
                hash = 31 * hash + item.GetHashCode();
            }
            return hash;
        }
    }

    public static int GetHashCode<T>(IEnumerable<T> list)
    {
        unchecked
        {
            int hash = 0;
            foreach (var item in list)
            {
                hash = 31 * hash + item.GetHashCode();
            }
            return hash;
        }
    }

    /// <summary>
    /// Gets a hashcode for a collection for that the order of items 
    /// does not matter.
    /// So {1, 2, 3} and {3, 2, 1} will get same hash code.
    /// </summary>
    public static int GetHashCodeForOrderNoMatterCollection<T>(
        IEnumerable<T> list)
    {
        unchecked
        {
            int hash = 0;
            int count = 0;
            foreach (var item in list)
            {
                hash += item.GetHashCode();
                count++;
            }
            return 31 * hash + count.GetHashCode();
        }
    }

    /// <summary>
    /// Alternative way to get a hashcode is to use a fluent 
    /// interface like this:<br />
    /// return 0.CombineHashCode(field1).CombineHashCode(field2).
    ///     CombineHashCode(field3);
    /// </summary>
    public static int CombineHashCode<T>(this int hashCode, T arg)
    {
        unchecked
        {
            return 31 * hashCode + arg.GetHashCode();   
        }
    }

Also it has extension method to provide a fluent interface, so you can use it like this:

public override int GetHashCode()
{
    return HashHelper.GetHashCode(Manufacturer, PartN, Quantity);
}

or like this:

public override int GetHashCode()
{
    return 0.CombineHashCode(Manufacturer)
        .CombineHashCode(PartN)
        .CombineHashCode(Quantity);
}

@nawfal 2013-04-14 12:43:57

No need for T[] separately as it is already IEnumerable<T>

@nawfal 2013-04-14 13:06:32

You could refactor those methods and restrict the core logic to one function

@Chui Tey 2013-08-22 23:14:42

Incidentally, 31 is a shift and subtract on the CPU, which is exceedingly fast.

@Eamon Nerbonne 2014-06-01 19:32:57

An extension method in int is nasty namespace pollution - @safak-gur 's answer below nicely side-steps that problem.

@ANeves 2015-02-09 13:54:47

@nightcoder you could use params.

@Pharap 2015-06-12 03:11:00

@ChuiTey This is something all Mersenne Primes have in common.

@geekley 2018-07-20 18:11:33

shouldn't the hash variable start with a non-zero? stackoverflow.com/a/113600/9638388

@KnorxThieus 2019-03-09 15:54:37

Just because it's cool, you can also do this with an one-liner: source?.Aggregate(0, (current, item) => unchecked(current * 31 + (item?.GetHashCode() ?? 0))) ?? 0;

@Bert Huijben 2009-02-23 11:55:44

In most cases where Equals() compares multiple fields it doesn't really matter if your GetHash() hashes on one field or on many. You just have to make sure that calculating the hash is really cheap (No allocations, please) and fast (No heavy computations and certainly no database connections) and provides a good distribution.

The heavy lifting should be part of the Equals() method; the hash should be a very cheap operation to enable calling Equals() on as few items as possible.

And one final tip: Don't rely on GetHashCode() being stable over multiple aplication runs. Many .Net types don't guarantee their hash codes to stay the same after a restart, so you should only use the value of GetHashCode() for in memory data structures.

@sleske 2010-04-15 15:44:29

"In most cases where Equals() compares multiple fields it doesn't really matter if your GetHash() hashes on one field or on many." This is dangerous advice, because for objects which only differ in the un-hashed fields, you will get hash collisions. If this happens frequently, the performance of hash-based collections (HashMap, HashSet etc.) will degrade (up to O(n) in the worst case).

@sleske 2010-04-15 15:51:57

This actually happened in Java: In early versions of the JDK String.hashCode() only considered the beginning of the string; this lead to performance problems if you used Strings as keys in HashMaps which only differed at the end (which is common e.g. for URLs). The algorithm was therefore changed ( in JDK 1.2 or 1.3 I believe).

@Bert Huijben 2010-04-16 09:12:51

If that one field 'provides a good distribution' (last part of my answer), then one field is enough.. If it doesn't provide a good distribution, then (and just then) you need another calculation. (E.g. just use another field that does provide a good distribution, or use multiple fields)

@supercat 2013-09-07 21:02:32

I don't think there's a problem with having GetHashCode perform memory allocations, provided that it only does so the first time it's used (with subsequent invocations simply returning a cached result). The important thing is not that one should go to great lengths to avoid collisions, but rather that one should avoid "systemic" collisions. If a type has two int fields oldX and newX which frequently differ by one, a hash value of oldX^newX would assign 90% of such records hash values of 1, 2, 4, or 8. Using oldX+newX [unchecked arithmetic] might generate more collisions...

@supercat 2013-09-07 21:04:17

...than would more sophisticated function, but a collection of 1,000,000 things that have 500,000 different hash values will very well if each hash value has two associated things, and very badly if one hash value has 500,001 things and the others have one each.

@Mark G 2008-11-05 05:03:24

Most of my work is done with database connectivity which means that my classes all have a unique identifier from the database. I always use the ID from the database to generate the hashcode.

// Unique ID from database
private int _id;

...    
{
  return _id.GetHashCode();
}

@Petar Repac 2010-03-22 15:28:48

That means that if you have objects Person and Account and they both have and ID = 1, they will have the same hash code. And that is not ok.

@piers7 2010-03-29 02:14:20

Actually the comment above is incorrect. There will always be the possibility of hash-code collisions (a hash code only locates the bucket, not the individual object). So such an implementation - for a hashcode containing mixed objects - would lead to a lot of collisions, which is undesirable, but it would be absolutely fine if you only ever had objects of a single type in your hashtables. Also it doesn't distribute evenly, however neither does the base implementation on system.object, so I wouldn't worry about it too much...

@Darrel Lee 2012-11-23 19:18:54

The hash code can just be the id, since the id is an integer. There's no need to call GetHashCode on an integer (it's an identity function)

@nawfal 2013-04-14 12:57:06

@DarrelLee but tomo his _id could be a Guid. It's a good coding practice to do _id.GetHashCode as the intent is clear.

@Trident D'Gao 2013-06-28 20:04:39

@DarrelLee, it is a not a good option because sequential IDs from the database don't provide a good distribution

@Jon Hanna 2014-01-14 18:29:48

@1224 depending on use patterns it can be horrible for the reason you give, but it can also be great; if you've a sequence of such numbers with no holes, then you've a perfect hash, better than any algorithm can produce. If you know that's the case you can even count on it and skip the equality check.

Related Questions

Sponsored Content

39 Answered Questions

9 Answered Questions

[SOLVED] What are the correct version numbers for C#?

63 Answered Questions

[SOLVED] What is the difference between String and string in C#?

31 Answered Questions

[SOLVED] What is a NullReferenceException, and how do I fix it?

26 Answered Questions

[SOLVED] Why not inherit from List<T>?

12 Answered Questions

[SOLVED] Why is it important to override GetHashCode when Equals method is overridden?

  • 2008-12-16 13:41:18
  • David Basarab
  • 345457 View
  • 1351 Score
  • 12 Answer
  • Tags:   c# overriding hashcode

27 Answered Questions

[SOLVED] What is tail recursion?

24 Answered Questions

[SOLVED] Image Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition

14 Answered Questions

[SOLVED] What is the optimal algorithm for the game 2048?

25 Answered Questions

[SOLVED] What is the best Battleship AI?

Sponsored Content