By xskxzr


2018-03-17 07:02:13 8 Comments

According to cppreference,

Rehashing occurs only if the new number of elements is greater than max_load_factor()*bucket_count().

In addition, [unord.req]/15 has similar rules:

The insert and emplace members shall not affect the validity of iterators if (N+n) <= z * B, where N is the number of elements in the container prior to the insert operation, n is the number of elements inserted, B is the container's bucket count, and z is the container's maximum load factor.

However, consider the following example:

#include <unordered_set>
#include <iostream>

int main()
{
    std::unordered_set<int> s;
    s.emplace(1);
    s.emplace(42);
    std::cout << s.bucket_count() << ' ';
    std::cout << (3 > s.max_load_factor() * s.bucket_count()) << ' ';
    s.emplace(2);
    std::cout << s.bucket_count() << ' ';
}

With GCC 8.0.1, it outputs

3 0 7

This means after emplacing 2, a rehashing occurs though the new number of elements (3) is not greater than max_load_factor()*bucket_count() (note the second output is 0). Why does this happen?

3 comments

@xskxzr 2019-02-23 07:52:57

The rehash condition is changed since Issue 2156. Before the change, a rehash occurs when the new number of elements is no less than max_load_factor()*bucket_count(), and it becomes "greater than" after the change.

GCC 8.0.1 does not implement this change. There is already a bug report, and it has been fixed in GCC 9.

@Walter 2018-03-17 16:16:32

You're confusing the fact that the bucket_count() has changed with the invalidation of iterators. Iterators are only invalidated in case of rehash, which will not be one if new number of elements is less than or equal to max_load_factor()*bucket_count() (btw if size()>max_load_factor()*bucket_count() rehashing can occur, but doesn't have to).

As this was not the case in your example, no rehashing occurred and iterators remain valid. However, the bucket count had to be increased to accommodate the new element.

I experimented a bit (expanding on your code) with Mac OSX's clang, which kept the iterators valid even after rehash(size()) (which did change the element's bucket association, tested directly by iterating over the buckets and printing their contents).

@xskxzr 2018-03-17 16:34:48

If you change 42 to 41 in my example, and output s.bucket(41) before and after s.emplace(2), you'll observe two different results. Doesn't this mean there is indeed a rehashing?

@Walter 2018-03-17 19:27:12

The bucket index (which bucket() returns) can/does change, even if the bucket itself remains the same.

@xskxzr 2018-03-18 02:48:01

Look at this, the index of bucket of 1 and that of 40 are the same before s.emplace(2), but become different after s.emplace(2), how do you explain that?

@Yola 2018-03-17 10:12:33

From 26.2.7 Unordered associative containers

The number of buckets is automatically increased as elements are added to an unordered associative container, so that the average number of elements per bucket is kept below a bound.

b.load_factor()           Returns the average number of elements per bucket.

b.max_load_factor()       Returns a positive number that the container attempts 
                          to keep the load factor less than or equal to. The container
                          automatically increases the number of buckets as necessary
                          to keep the load factor below this number.

I agree, the first part of description of max_load_factor suggests that the load factor could reach that value, but in the second part and in the foregoing quote it's clearly stated that the load factor will be kept below this number. So, you have found a mistake in cppreference.

In your code, without rehashing, after the third insertion your would have s.load_factor equal to s.max_load_factor().

EDIT: To answer changes in the question i checked available to me VS implementation of unordered_set, it is implemented as

// hash table -- list with vector of iterators for quick access

and then you ask for an iterator, using e.g. lower_bound, you get iterator to the list element, which doesn't get invalidated by rehashing. So, it agrees with [unord.req]/15.

@xskxzr 2018-03-17 14:47:58

But [unord.req]/15 agrees with the description of cppreference.

@Yola 2018-03-17 15:40:43

@xskxzr yes, and iterators don't get invalidated after this operation. Interesting discovery:)

Related Questions

Sponsored Content

39 Answered Questions

[SOLVED] Why is "using namespace std;" considered bad practice?

1 Answered Questions

Why does unordered_set::begin() change if no rehashing occurs?

  • 2020-05-24 07:42:59
  • user5965026
  • 38 View
  • 0 Score
  • 1 Answer
  • Tags:   c++ hash iterator

1 Answered Questions

4 Answered Questions

[SOLVED] Using a std::unordered_set of std::unique_ptr

2 Answered Questions

[SOLVED] Why does the 32769th insert fail in std::unordered_set?

1 Answered Questions

[SOLVED] Behaviour of unnecessary or redundant calls to std::unordered_map::reserve

2 Answered Questions

1 Answered Questions

[SOLVED] How many times can rehashing occur in a hashmap

  • 2016-06-21 16:27:50
  • yeppe
  • 571 View
  • 1 Score
  • 1 Answer
  • Tags:   java hash hashmap

4 Answered Questions

[SOLVED] Do the iterator invalidation rules mean thread safety?

Sponsored Content