Difference between revisions of "Intermediate C++ Game Programming Tutorial 24"

Latest revision as of 23:47, 2 February 2020

Associative containers are super useful, both as a convenient fast way to create dictionary or mapping for real-world problems like managing game resources, and as a data structure to help solve more abstract algorithmic computer science problems. And hash tables are fast as balls.

Topics Covered

Part 1: ordered associative containers

std::map container interface
Binary tree data structure
std::map key requirements (comparison)
std::map gotchas (std::remove_if and const keys)
std::set
std::multimap and std::multiset

Part 2: unordered associative containers

Hash table performance vs. binary tree performance
Hash table data structure
std::unordered_map key requirements
Hash combining
std::unordered_map bucket interface and hashing policy
When to choose std::map over std::unordered_map

Video Timestamp Index

Tutorial 24.1: The ordered associative containers

The std::map<KeyType,ValueType> class 0:46

- Maps consist of keys to lookup (associated) values
- map.insert( {key,value} ) to insert (key,value) pairs
- map[key] returns a reference to the ValueType for a KeyType

A Binary Tree data structure is used to manage the order of map elements 2:46

- std::map performs lookup in O(log(n)), it uses a Binary tree data structure
- Key properties of a Binary Tree (BT):
- Nodes can have at most 2 children (hence: binary)

- Each left child is smaller and each right child is larger than its parent

- Insertion is done by navigating the tree along a route Left for smaller, Right for larger such that the order property always holds
- The big advantage of the BT properties is that retrieval is very fast
- The beauty of std::map is that we don't have to implement any of this; it's all there in the STL 7:00
- The STL implementation is further optimized, e.g. it uses a red-black tree for BT rebalancing

A look at the std::map cppreference.com documentation: insert, lookup & find 7:35

- map.insert() takes a pair type std::pair<KeyType,ValueType>, the Map's elements
- C++ can deduce the pair Type, so map.insert({keyX,valueXYZ}); with curly braces will do the job
- An even better way to insert is through map.emplace() operation; it will construct the pair in-place.
- For lookup, you can use square braces, map[x] will return a reference to the corresponding value
- Note: a lookup with a new key value will create that element in the map with the default constructed ValueType value
- insert or emplace with a key that already exists will NOT override the existing value: std::map::emplace returns a std::pair<iterator,bool> where the bool inidicates whether an insertion took place
- map.find("xyz") returns an iterator to the element if it exitst, and an iterator to map.end() if it doesn't exist (useful to check if a key already exists)
- std::map comes with iterators and because it is a sorted map, when you iterate over its elements with for (auto& el : map), it will be in order (of the keys)

Requirements on KeyType 14:30

- The KeyType has to be comparable. The third template parameter is a functor for KeyType Comparison that defaults to std::less<KeyType>
- So by default keys have to implement the "less than" comparison operator or provide your own comparison functor when defining the map

std::map cppreference.com documentation continued: erase 15:28

- std::map::erase offers three basic ways to erase elements:
- With an iterator; returns an iterator following the last removed element

- With an iterator range, idem

- By key through map.erase(const KeyType& key); this operation returns the number of elements erased (in size_type)

Two important things to know when working with associative containers 16:04

- std::remove_if does not work with associative containers (will come with C++20).
- You have to iterate over the elements with for( auto i = map.begin(); i != map.end();)

- And apply i = map.erase(i); in the body of your if logic, and ++i in the else block.
- Keys are const. You're not allowed to modify the keys 18:38
- Makes sense: the keys define the structure of the binary tree.

- If you modify the key you invalidate this structure (it would require a deletion and insertion to do it properly)

The std::set<KeyType> class 20:00

- With a set, you only have keys, and a unique entry for each unique key
- Use case: ensure that there are no duplicates in a set

The std::multimap and std::multiset classes 21:28

- Map has unique keys, with multimap you can insert multiple elements with the same key
- This enables operations like std::multimap::equal_range that returns a pair of iterators (begin and end) of the range where these elements have that same key
- std::multimap::count will return the number of elements with specific key

Practical example of a multimap use case 22:30

- Implementation example of a custom Comparison functor for the Vei2 class (2D coordinate vector).
- Chili's choice for ordering (used in the body of the functor):

- return (lhs.x == rhs.x) ? lhs.y < rhs.y : lhs.x < rhs.x;
- Example of how to find and print multiple elements in a multimap using equal_range()

Lookup in multimaps 25:21

- Note: the multimap class does not have an index operator []
- When you do a lookup on a multimap, you should use equal_range()
- The problem with find() on a multimap, is that if there are several elements with key in the ccontainer, any of them may be returned

Tutorial 24.2: The unordered associative containers

Main difference between ordered/unordered: performance 0:14

Implication: if you iterative over an unordered container, keys will appear in (seemingly) random order
Releasing the ordering requirement makes it possible to use a hash table with performance advantages: O(1) contant time insertion and lookup

Using an unordered map 1:38

The interface is pretty much the same as its ordered counterpart
Include <unordered_map>, declare using std::unordered_map<KeyType,ValueType>
You can initialize your map object with an initializer list if you wanted to using ({ {..,..},{..,..},... }) inside your declaration

The Hash Table data structure 3:20

A hash table allows you to get the quick access to values, comparable to array access using the index, but with efficient memory usage
Buckets are used to group keys; this is done by mapping keys to buckets using a hash function (a.k.a. hashing)
Multiple keys can map to the same bucket in a hash table ("collision"). We use a linked list to store multiple {key,value} pairs in a bucket
Two ways to minimize hash collisions: i) more buckets, ii) smart hash function that distributes key values uniformly across your bucket space
Hashing a a two step process 9:26:

- A hash function takes in the KeyType input (typically a string or int) and outputs a size_t

- the size_t output is reduced/ditributed to the size of the hash table (number of buckets)

The Standard Library provides general hashing functions for all the standard types
For general use of unordered maps, we don't have to worry about the technical details of how the hash table works, the STL provides this

Requirements for the KeyType of an unordered_map / a hash table 11:56

There needs to be a working hash function defined for the KeyType
There need to be comparison and equality functor definitions for the KeyType

Example: map from Vec2 class (2D coordinates) to a string 12:46

In order to make this work, you need to define a hash function and the comparators for Vei2
You can implement a comparison/equality functor as a struct that defines a operator() member function, templated on T

struct EqVec2
{
    template <typename T>
    bool operator()( const T& lhs,const T& rhs ) const
    {
        return (lhs.x == rhs.x) && (lhs.y == rhs.y);
    }
};

Defining a custom hashing function is an art, it requires knowledge of cryptography, abstract algebra, discrete math, etc.
Luckily, we don't need this; you can revert to the standard hashing functions for the basic types that make up any custom type

Hash combining 14:25

Combining hashes from basic types to create a hash over your custom object
A simple google search will give you good examples of how to combine hash values in C++
You can implement a hashing functor as a struct that defines a member function, templated on T, the basic type of the Vec2 coordinates:

struct HashVec2
{
    template <typename T>
    size_t operator()( const _Vec2<T>& vec ) const
    {
        std::hash<T> hasher;
        auto hashx = hasher ( vec.x );
        auto hashy = hasher ( vec.y );
        hashx ^= hashy + 0x9e3779b9 + (hashx << 6) + (hashx >> 2);
        return hashx;
    }
};

You pass this functors when defining the map: std::unordered_map<Vei2,std::string,HashVec2> map; 17:15.
Note that the comparison functor is not needed: we can revert back to the equality operator already defined in the Vec2 class definition

Template Specialization 18:43

Unordered map uses std::hash by default. You can inject Template Specialization for std::hash into the std Namespace for your own custom types only

namespace std
{
    template <> struct hash<Vei2>
    {
        size_t operator()( cont Vei2& vec ) const
        {...}
    };
}

Now you don't need to pass HashVec2 in the map definition

The std::unordered_map<> Bucket interface 20:00

Allows you to get information about the buckets in the hash table and access nodes
The bucket iterator takes an index of the bucket and allows you to iterate over all the elements in that specific bucket

The std::unordered_map<> Hash policy interface 21:47

Allows you to tune your hash table (and thus the growth behavior & performance of the map)
Load Factor = average number of elements per bucket. For performance, you typically want to keep this below 1
You can set the maximum load factor above which the table gets rehashed
When the load factor becomes too high, it will automaticall rehash the table and increase the number of buckets
You can manually rehash to a number of buckets you define
You can reserve space for max number of elements, is then derives (and manages) the required number of buckets

When to choose std::map over std::unordered_map? 25:15

For simplicity and when performance is not a critical issue, no need to define a hash function;
If you want to iterate in order;
When you want to be able to find keys that are close to a certain key (with lower_bound and upper_bount

Homework assignment 26:04

Homework Assignment

The homework for this video is to enable use of a custom datatype in unordered_map hashing over multiple (4) members of that datatype. The solution video is here.

Supplementary Link

Data Structure / Algorithm Visualizations

Difference between revisions of "Intermediate C++ Game Programming Tutorial 24"

Latest revision as of 23:47, 2 February 2020

Contents

Topics Covered

Part 1: ordered associative containers

Part 2: unordered associative containers

Video Timestamp Index

Tutorial 24.1: The ordered associative containers

Tutorial 24.2: The unordered associative containers

Homework Assignment

Supplementary Link

See also

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

@@ Line 2: / Line 2: @@
 == Topics Covered ==
+=== Part 1: ordered associative containers ===
 * <code>std::map</code> container interface
 * Binary tree data structure
@@ Line 8: / Line 9: @@
 * <code>std::set</code>
 * <code>std::multimap</code> and <code>std::multiset</code>
+=== Part 2: unordered associative containers ===
+* Hash table performance vs. binary tree performance
+* Hash table data structure
+* <code>std::unordered_map</code> key requirements
+* Hash combining
+* <code>std::unordered_map</code> bucket interface and hashing policy
+* When to choose <code>std::map</code> over <code>std::unordered_map</code>
 == Video Timestamp Index ==
-* [https://youtu.be/JlPsCoCO99o Tutorial 24.1]
+=== [https://youtu.be/JlPsCoCO99o Tutorial 24.1]: The ordered associative containers ===
-* [https://youtu.be/LsjFAx-dG5I Tutorial 24.2]
+<div class="mw-collapsible mw-collapsed"><br />
+* The <code>std::map<KeyType,ValueType></code> class [https://youtu.be/JlPsCoCO99o?t=0m46s 0:46]
+<div class="mw-collapsible-content">
+** Maps consist of keys to lookup (associated) values
+** <code>map.insert( {key,value} )</code> to insert (key,value) pairs
+** <code>map[key]</code> returns a reference to the ValueType for a KeyType
+</div>
+* A Binary Tree data structure is used to manage the order of map elements [https://youtu.be/JlPsCoCO99o?t=2m46s 2:46]
+<div class="mw-collapsible-content">
+** <code>std::map</code> performs lookup in O(log(n)), it uses a Binary tree data structure
+** Key properties of a Binary Tree (BT):
+*:- Nodes can have at most 2 children (hence: binary)
+*:- Each left child is smaller and each right child is larger than its parent
+*:- Insertion is done by navigating the tree along a route Left for smaller, Right for larger such that the order property always holds
+** The big advantage of the BT properties is that retrieval is very fast
+** The beauty of <code>std::map</code> is that we don't have to implement any of this; it's all there in the STL [https://youtu.be/JlPsCoCO99o?t=7m00s 7:00]
+** The STL implementation is further optimized, e.g. it uses a red-black tree for BT rebalancing
+</div>
+* A look at the <code>std::map</code> cppreference.com documentation: insert, lookup & find [https://youtu.be/JlPsCoCO99o?t=7m35s 7:35]
+<div class="mw-collapsible-content">
+** <code>map.insert()</code> takes a pair type <code>std::pair<KeyType,ValueType></code>, the Map's elements
+** C++ can deduce the pair Type, so <code>map.insert({keyX,valueXYZ});</code> with curly braces will do the job
+** An even better way to insert is through <code>map.emplace()</code> operation; it will construct the pair in-place.
+** For lookup, you can use square braces, <code>map[x]</code> will return a reference to the corresponding value
+** Note: a lookup with a new key value will create that element in the map with the default constructed ValueType value
+** <code>insert</code> or <code>emplace</code> with a key that already exists will NOT override the existing value: <code>std::map::emplace</code> returns a <code>std::pair<iterator,bool></code> where the bool inidicates whether an insertion took place
+** <code>map.find("xyz")</code> returns an iterator to the element if it exitst, and an iterator to <code>map.end()</code> if it doesn't exist (useful to check if a key already exists)
+** <code>std::map</code> comes with iterators and because it is a sorted map, when you iterate over its elements with <code>for (auto& el : map)</code>, it will be in order (of the keys)
+</div>
+* Requirements on KeyType [https://youtu.be/JlPsCoCO99o?t=14m30s 14:30]
+<div class="mw-collapsible-content">
+** The KeyType has to be comparable. The third template parameter is a functor for KeyType Comparison that defaults to <code>std::less<KeyType></code>
+** So by default keys have to implement the "less than" comparison operator or provide your own comparison functor when defining the map
+</div>
+* <code>std::map</code> cppreference.com documentation continued: erase [https://youtu.be/JlPsCoCO99o?t=15m28s 15:28]
+<div class="mw-collapsible-content">
+** <code>std::map::erase</code> offers three basic ways to erase elements:
+*:- With an iterator; returns an iterator following the last removed element
+*:- With an iterator range, idem
+*:- By key through <code>map.erase(const KeyType& key)</code>; this operation returns the number of elements erased (in <code>size_type</code>)
+</div>
+* Two important things to know when working with associative containers [https://youtu.be/JlPsCoCO99o?t=16m04s 16:04]
+<div class="mw-collapsible-content">
+** <code>std::remove_if</code> does not work with associative containers (will come with C++20).
+*:- You have to iterate over the elements with <code>for( auto i = map.begin(); i != map.end();)</code>
+*:- And apply <code>i = map.erase(i);</code> in the body of your <code>if</code> logic, and <code>++i</code> in the <code>else</code> block.
+** Keys are <code>const</code>. You're not allowed to modify the keys [https://youtu.be/JlPsCoCO99o?t=18m38s 18:38]
+*:- Makes sense: the keys define the structure of the binary tree.
+*:- If you modify the key you invalidate this structure (it would require a deletion and insertion to do it properly)
+</div>
+* The <code>std::set<KeyType></code> class [https://youtu.be/JlPsCoCO99o?t=20m00s 20:00]
+<div class="mw-collapsible-content">
+** With a set, you only have keys, and a unique entry for each unique key
+** Use case: ensure that there are no duplicates in a set
+</div>
+* The <code>std::multimap</code> and <code>std::multiset</code> classes [https://youtu.be/JlPsCoCO99o?t=21m28s 21:28]
+<div class="mw-collapsible-content">
+** Map has unique keys, with multimap you can insert multiple elements with the same key
+** This enables operations like <code>std::multimap::equal_range</code> that returns a pair of iterators (begin and end) of the range where these elements have that same key
+** <code>std::multimap::count</code> will return the number of elements with  specific key
+</div>
+* Practical example of a multimap use case [https://youtu.be/JlPsCoCO99o?t=22m30s 22:30]
+<div class="mw-collapsible-content">
+** Implementation example of a custom Comparison functor for the <code>Vei2</code> class (2D coordinate vector).
+*:- Chili's choice for ordering (used in the body of the functor):
+*:- <code>return (lhs.x == rhs.x) ? lhs.y < rhs.y : lhs.x < rhs.x;</code>
+** Example of how to find and print multiple elements in a multimap using <code>equal_range()</code>
+</div>
+* Lookup in multimaps [https://youtu.be/JlPsCoCO99o?t=25m21s 25:21]
+<div class="mw-collapsible-content">
+** Note: the multimap class does not have an index operator <code>[]</code>
+** When you do a lookup on a multimap, you should use <code>equal_range()</code>
+** The problem with <code>find()</code> on a multimap, is that if there are several elements with key in the ccontainer, any of them may be returned
+</div>
+</div>
+=== [https://youtu.be/LsjFAx-dG5I Tutorial 24.2]: The unordered associative containers ===
+<div class="mw-collapsible mw-collapsed"><br />
+* Main difference between ordered/unordered: performance [https://youtu.be/LsjFAx-dG5I?t=0m14s 0:14]
+<div class="mw-collapsible-content">
+:* Implication: if you iterative over an unordered container, keys will appear in (seemingly) random order
+:* Releasing the ordering requirement makes it possible to use a hash table with performance advantages: O(1) contant time insertion and lookup
+</div>
+* Using an unordered map [https://youtu.be/LsjFAx-dG5I?t=1m38s 1:38]
+<div class="mw-collapsible-content">
+:* The interface is pretty much the same as its ordered counterpart
+:* Include <code><unordered_map></code>, declare using <code>std::unordered_map<KeyType,ValueType></code>
+:* You can initialize your map object with an initializer list if you wanted to using <code>({ {..,..},{..,..},... })</code> inside your declaration
+</div>
+* The Hash Table data structure [https://youtu.be/LsjFAx-dG5I?t=3m20s 3:20]
+<div class="mw-collapsible-content">
+:* A hash table allows you to get the quick access to values, comparable to array access using the index, but with efficient memory usage
+:* Buckets are used to group keys; this is done by mapping keys to buckets using a hash function (a.k.a. hashing)
+:* Multiple keys can map to the same bucket in a hash table ("collision"). We use a linked list to store multiple {key,value} pairs in a bucket
+:* Two ways to minimize hash collisions: i) more buckets, ii) smart hash function that distributes key values uniformly across your bucket space
+:* Hashing a a two step process [https://youtu.be/LsjFAx-dG5I?t=9m26s 9:26]:
+::- A hash function takes in the KeyType input (typically a string or int) and outputs a size_t
+::- the size_t output is reduced/ditributed to the size of the hash table (number of buckets)
+:* The Standard Library provides general hashing functions for all the standard types
+:* For general use of unordered maps, we don't have to worry about the technical details of how the hash table works, the STL provides this
+</div>
+* Requirements for the KeyType of an <code>unordered_map</code> / a hash table [https://youtu.be/LsjFAx-dG5I?t=11m56s 11:56]
+<div class="mw-collapsible-content">
+:* There needs to be a working hash function defined for the KeyType
+:* There need to be comparison and equality functor definitions for the KeyType
+</div>
+* Example: map from <code>Vec2</code> class (2D coordinates) to a string [https://youtu.be/LsjFAx-dG5I?t=12m46s 12:46]
+<div class="mw-collapsible-content">
+:* In order to make this work, you need to define a hash function and the comparators for <code>Vei2</code>
+:* You can implement a comparison/equality functor as a <code>struct</code> that defines a <code>operator()</code> member function, templated on <code>T</code>
+::<syntaxhighlight lang="cpp" line>
+struct EqVec2
+{
+    template <typename T>
+    bool operator()( const T& lhs,const T& rhs ) const
+    {
+        return (lhs.x == rhs.x) && (lhs.y == rhs.y);
+    }
+};
+</syntaxhighlight>
+:* Defining a custom hashing function is an art, it requires knowledge of cryptography, abstract algebra, discrete math, etc.
+:* Luckily, we don't need this; you can revert to the standard hashing functions for the basic types that make up any custom type
+</div>
+* Hash combining [https://youtu.be/LsjFAx-dG5I?t=14m25s 14:25]
+<div class="mw-collapsible-content">
+:* Combining hashes from basic types to create a hash over your custom object
+:* A simple google search will give you good examples of how to combine hash values in C++
+:* You can implement a hashing functor as a <code>struct</code> that defines a member function, templated on <code>T</code>, the basic type of the <code>Vec2</code> coordinates:
+::<syntaxhighlight lang="cpp" line>
+struct HashVec2
+{
+    template <typename T>
+    size_t operator()( const _Vec2<T>& vec ) const
+    {
+        std::hash<T> hasher;
+        auto hashx = hasher ( vec.x );
+        auto hashy = hasher ( vec.y );
+        hashx ^= hashy + 0x9e3779b9 + (hashx << 6) + (hashx >> 2);
+        return hashx;
+    }
+};
+</syntaxhighlight>
+:* You pass this functors when defining the map: <code>std::unordered_map<Vei2,std::string,HashVec2> map;</code> [https://youtu.be/LsjFAx-dG5I?t=17m15s 17:15].
+:* Note that the comparison functor is not needed: we can revert back to the equality operator already defined in the <code>Vec2</code> class definition
+</div>
+* Template Specialization [https://youtu.be/LsjFAx-dG5I?t=18m43s 18:43]
+<div class="mw-collapsible-content">
+:* Unordered map uses <code>std::hash</code> by default. You can inject Template Specialization for <code>std::hash</code> into the <code>std</code> Namespace for your own custom types only
+::<syntaxhighlight lang="cpp" line>
+namespace std
+{
+    template <> struct hash<Vei2>
+    {
+        size_t operator()( cont Vei2& vec ) const
+        {...}
+    };
+}
+</syntaxhighlight>
+:* Now you don't need to pass <code>HashVec2</code> in the map definition
+</div>
+* The <code>std::unordered_map<></code> Bucket interface [https://youtu.be/LsjFAx-dG5I?t=20m00s 20:00]
+<div class="mw-collapsible-content">
+:* Allows you to get information about the buckets in the hash table and access nodes
+:* The bucket iterator takes an index of the bucket and allows you to iterate over all the elements in that specific bucket
+</div>
+* The <code>std::unordered_map<></code> Hash policy interface [https://youtu.be/LsjFAx-dG5I?t=21m47s 21:47]
+<div class="mw-collapsible-content">
+:* Allows you to tune your hash table (and thus the growth behavior & performance of the map)
+:* Load Factor = average number of elements per bucket. For performance, you typically want to keep this below 1
+:* You can set the maximum load factor above which the table gets rehashed
+:* When the load factor becomes too high, it will automaticall rehash the table and increase the number of buckets
+:* You can manually rehash to a number of buckets you define
+:* You can reserve space for max number of elements, is then derives (and manages) the required number of buckets
+</div>
+* When to choose <code>std::map</code> over <code>std::unordered_map</code>? [https://youtu.be/LsjFAx-dG5I?t=24m15s 25:15]
+<div class="mw-collapsible-content">
+:* For simplicity and when performance is not a critical issue, no need to define a hash function;
+:* If you want to iterate in order;
+:* When you want to be able to find keys that are close to a certain key (with <code>lower_bound</code> and <code>upper_bount</code>
+</div>
+* Homework assignment [https://youtu.be/LsjFAx-dG5I?t=26m04s 26:04]
+</div>
+== Homework Assignment ==
+The homework for this video is to enable use of a custom datatype in <code>unordered_map</code> hashing over multiple (4) members of that datatype. The solution video is [https://youtu.be/9qiJytSz9iM here].
 == Supplementary Link ==