Difference between revisions of "Intermediate C++ Game Programming Tutorial 24"

From Chilipedia
Jump to: navigation, search
(Video Timestamp Index)
(Tutorial 24.2: The unordered associative containers)
 
(133 intermediate revisions by 2 users not shown)
Line 2: Line 2:
  
 
== Topics Covered ==
 
== Topics Covered ==
 +
=== Part 1: ordered associative containers ===
 
* <code>std::map</code> container interface
 
* <code>std::map</code> container interface
 
* Binary tree data structure
 
* Binary tree data structure
Line 8: Line 9:
 
* <code>std::set</code>
 
* <code>std::set</code>
 
* <code>std::multimap</code> and <code>std::multiset</code>
 
* <code>std::multimap</code> and <code>std::multiset</code>
 +
 +
=== Part 2: unordered associative containers ===
 +
* Hash table performance vs. binary tree performance
 +
* Hash table data structure
 +
* <code>std::unordered_map</code> key requirements
 +
* Hash combining
 +
* <code>std::unordered_map</code> bucket interface and hashing policy
 +
* When to choose <code>std::map</code> over <code>std::unordered_map</code>
  
 
== Video Timestamp Index ==
 
== Video Timestamp Index ==
* [https://youtu.be/JlPsCoCO99o Tutorial 24.1]
+
=== [https://youtu.be/JlPsCoCO99o Tutorial 24.1]: The ordered associative containers ===
* [https://youtu.be/LsjFAx-dG5I Tutorial 24.2]
+
<div class="mw-collapsible mw-collapsed"><br />
 +
* The <code>std::map<KeyType,ValueType></code> class [https://youtu.be/JlPsCoCO99o?t=0m46s 0:46]
 +
<div class="mw-collapsible-content">
 +
** Maps consist of keys to lookup (associated) values
 +
** <code>map.insert( {key,value} )</code> to insert (key,value) pairs
 +
** <code>map[key]</code> returns a reference to the ValueType for a KeyType
 +
</div>
 +
* A Binary Tree data structure is used to manage the order of map elements [https://youtu.be/JlPsCoCO99o?t=2m46s 2:46]
 +
<div class="mw-collapsible-content">
 +
** <code>std::map</code> performs lookup in O(log(n)), it uses a Binary tree data structure
 +
** Key properties of a Binary Tree (BT):
 +
*:- Nodes can have at most 2 children (hence: binary)
 +
*:- Each left child is smaller and each right child is larger than its parent
 +
*:- Insertion is done by navigating the tree along a route Left for smaller, Right for larger such that the order property always holds
 +
** The big advantage of the BT properties is that retrieval is very fast
 +
** The beauty of <code>std::map</code> is that we don't have to implement any of this; it's all there in the STL [https://youtu.be/JlPsCoCO99o?t=7m00s 7:00]
 +
** The STL implementation is further optimized, e.g. it uses a red-black tree for BT rebalancing
 +
</div>
 +
* A look at the <code>std::map</code> cppreference.com documentation: insert, lookup & find [https://youtu.be/JlPsCoCO99o?t=7m35s 7:35]
 +
<div class="mw-collapsible-content">
 +
** <code>map.insert()</code> takes a pair type <code>std::pair<KeyType,ValueType></code>, the Map's elements
 +
** C++ can deduce the pair Type, so <code>map.insert({keyX,valueXYZ});</code> with curly braces will do the job
 +
** An even better way to insert is through <code>map.emplace()</code> operation; it will construct the pair in-place.
 +
** For lookup, you can use square braces, <code>map[x]</code> will return a reference to the corresponding value
 +
** Note: a lookup with a new key value will create that element in the map with the default constructed ValueType value
 +
** <code>insert</code> or <code>emplace</code> with a key that already exists will NOT override the existing value: <code>std::map::emplace</code> returns a <code>std::pair<iterator,bool></code> where the bool inidicates whether an insertion took place
 +
** <code>map.find("xyz")</code> returns an iterator to the element if it exitst, and an iterator to <code>map.end()</code> if it doesn't exist (useful to check if a key already exists)
 +
** <code>std::map</code> comes with iterators and because it is a sorted map, when you iterate over its elements with <code>for (auto& el : map)</code>, it will be in order (of the keys)
 +
</div>
 +
* Requirements on KeyType [https://youtu.be/JlPsCoCO99o?t=14m30s 14:30]
 +
<div class="mw-collapsible-content">
 +
** The KeyType has to be comparable. The third template parameter is a functor for KeyType Comparison that defaults to <code>std::less<KeyType></code>
 +
** So by default keys have to implement the "less than" comparison operator or provide your own comparison functor when defining the map
 +
</div>
 +
* <code>std::map</code> cppreference.com documentation continued: erase [https://youtu.be/JlPsCoCO99o?t=15m28s 15:28]
 +
<div class="mw-collapsible-content">
 +
** <code>std::map::erase</code> offers three basic ways to erase elements:
 +
*:- With an iterator; returns an iterator following the last removed element
 +
*:- With an iterator range, idem
 +
*:- By key through <code>map.erase(const KeyType& key)</code>; this operation returns the number of elements erased (in <code>size_type</code>)
 +
</div>
 +
* Two important things to know when working with associative containers [https://youtu.be/JlPsCoCO99o?t=16m04s 16:04]
 +
<div class="mw-collapsible-content">
 +
** <code>std::remove_if</code> does not work with associative containers (will come with C++20).
 +
*:- You have to iterate over the elements with <code>for( auto i = map.begin(); i != map.end();)</code>
 +
*:- And apply <code>i = map.erase(i);</code> in the body of your <code>if</code> logic, and <code>++i</code> in the <code>else</code> block.
 +
** Keys are <code>const</code>. You're not allowed to modify the keys [https://youtu.be/JlPsCoCO99o?t=18m38s 18:38]
 +
*:- Makes sense: the keys define the structure of the binary tree.
 +
*:- If you modify the key you invalidate this structure (it would require a deletion and insertion to do it properly)
 +
</div>
 +
* The <code>std::set<KeyType></code> class [https://youtu.be/JlPsCoCO99o?t=20m00s 20:00]
 +
<div class="mw-collapsible-content">
 +
** With a set, you only have keys, and a unique entry for each unique key
 +
** Use case: ensure that there are no duplicates in a set
 +
</div>
 +
* The <code>std::multimap</code> and <code>std::multiset</code> classes [https://youtu.be/JlPsCoCO99o?t=21m28s 21:28]
 +
<div class="mw-collapsible-content">
 +
** Map has unique keys, with multimap you can insert multiple elements with the same key
 +
** This enables operations like <code>std::multimap::equal_range</code> that returns a pair of iterators (begin and end) of the range where these elements have that same key
 +
** <code>std::multimap::count</code> will return the number of elements with  specific key
 +
</div>
 +
* Practical example of a multimap use case [https://youtu.be/JlPsCoCO99o?t=22m30s 22:30]
 +
<div class="mw-collapsible-content">
 +
** Implementation example of a custom Comparison functor for the <code>Vei2</code> class (2D coordinate vector).
 +
*:- Chili's choice for ordering (used in the body of the functor):
 +
*:- <code>return (lhs.x == rhs.x) ? lhs.y < rhs.y : lhs.x < rhs.x;</code>
 +
** Example of how to find and print multiple elements in a multimap using <code>equal_range()</code>
 +
</div>
 +
* Lookup in multimaps [https://youtu.be/JlPsCoCO99o?t=25m21s 25:21]
 +
<div class="mw-collapsible-content">
 +
** Note: the multimap class does not have an index operator <code>[]</code>
 +
** When you do a lookup on a multimap, you should use <code>equal_range()</code>
 +
** The problem with <code>find()</code> on a multimap, is that if there are several elements with key in the ccontainer, any of them may be returned
 +
</div>
 +
</div>
 +
=== [https://youtu.be/LsjFAx-dG5I Tutorial 24.2]: The unordered associative containers ===
 +
<div class="mw-collapsible mw-collapsed"><br />
 +
* Main difference between ordered/unordered: performance [https://youtu.be/LsjFAx-dG5I?t=0m14s 0:14]
 +
<div class="mw-collapsible-content">
 +
:* Implication: if you iterative over an unordered container, keys will appear in (seemingly) random order
 +
:* Releasing the ordering requirement makes it possible to use a hash table with performance advantages: O(1) contant time insertion and lookup
 +
</div>
 +
* Using an unordered map [https://youtu.be/LsjFAx-dG5I?t=1m38s 1:38]
 +
<div class="mw-collapsible-content">
 +
:* The interface is pretty much the same as its ordered counterpart
 +
:* Include <code><unordered_map></code>, declare using <code>std::unordered_map<KeyType,ValueType></code>
 +
:* You can initialize your map object with an initializer list if you wanted to using <code>({ {..,..},{..,..},... })</code> inside your declaration
 +
</div>
 +
* The Hash Table data structure [https://youtu.be/LsjFAx-dG5I?t=3m20s 3:20]
 +
<div class="mw-collapsible-content">
 +
:* A hash table allows you to get the quick access to values, comparable to array access using the index, but with efficient memory usage
 +
:* Buckets are used to group keys; this is done by mapping keys to buckets using a hash function (a.k.a. hashing)
 +
:* Multiple keys can map to the same bucket in a hash table ("collision"). We use a linked list to store multiple {key,value} pairs in a bucket
 +
:* Two ways to minimize hash collisions: i) more buckets, ii) smart hash function that distributes key values uniformly across your bucket space
 +
:* Hashing a a two step process [https://youtu.be/LsjFAx-dG5I?t=9m26s 9:26]:
 +
::- A hash function takes in the KeyType input (typically a string or int) and outputs a size_t
 +
::- the size_t output is reduced/ditributed to the size of the hash table (number of buckets)
 +
:* The Standard Library provides general hashing functions for all the standard types
 +
:* For general use of unordered maps, we don't have to worry about the technical details of how the hash table works, the STL provides this
 +
</div>
 +
* Requirements for the KeyType of an <code>unordered_map</code> / a hash table [https://youtu.be/LsjFAx-dG5I?t=11m56s 11:56]
 +
<div class="mw-collapsible-content">
 +
:* There needs to be a working hash function defined for the KeyType
 +
:* There need to be comparison and equality functor definitions for the KeyType
 +
</div>
 +
* Example: map from <code>Vec2</code> class (2D coordinates) to a string [https://youtu.be/LsjFAx-dG5I?t=12m46s 12:46]
 +
<div class="mw-collapsible-content">
 +
:* In order to make this work, you need to define a hash function and the comparators for <code>Vei2</code>
 +
:* You can implement a comparison/equality functor as a <code>struct</code> that defines a <code>operator()</code> member function, templated on <code>T</code>
 +
::<syntaxhighlight lang="cpp" line>
 +
struct EqVec2
 +
{
 +
    template <typename T>
 +
    bool operator()( const T& lhs,const T& rhs ) const
 +
    {
 +
        return (lhs.x == rhs.x) && (lhs.y == rhs.y);
 +
    }
 +
};
 +
</syntaxhighlight>
 +
:* Defining a custom hashing function is an art, it requires knowledge of cryptography, abstract algebra, discrete math, etc.
 +
:* Luckily, we don't need this; you can revert to the standard hashing functions for the basic types that make up any custom type
 +
</div>
 +
* Hash combining [https://youtu.be/LsjFAx-dG5I?t=14m25s 14:25]
 +
<div class="mw-collapsible-content">
 +
:* Combining hashes from basic types to create a hash over your custom object
 +
:* A simple google search will give you good examples of how to combine hash values in C++
 +
:* You can implement a hashing functor as a <code>struct</code> that defines a member function, templated on <code>T</code>, the basic type of the <code>Vec2</code> coordinates:
 +
::<syntaxhighlight lang="cpp" line>
 +
struct HashVec2
 +
{
 +
    template <typename T>
 +
    size_t operator()( const _Vec2<T>& vec ) const
 +
    {
 +
        std::hash<T> hasher;
 +
        auto hashx = hasher ( vec.x );
 +
        auto hashy = hasher ( vec.y );
 +
        hashx ^= hashy + 0x9e3779b9 + (hashx << 6) + (hashx >> 2);
 +
        return hashx;
 +
    }
 +
};
 +
</syntaxhighlight>
 +
:* You pass this functors when defining the map: <code>std::unordered_map<Vei2,std::string,HashVec2> map;</code> [https://youtu.be/LsjFAx-dG5I?t=17m15s 17:15].
 +
:* Note that the comparison functor is not needed: we can revert back to the equality operator already defined in the <code>Vec2</code> class definition
 +
</div>
 +
* Template Specialization [https://youtu.be/LsjFAx-dG5I?t=18m43s 18:43]
 +
<div class="mw-collapsible-content">
 +
:* Unordered map uses <code>std::hash</code> by default. You can inject Template Specialization for <code>std::hash</code> into the <code>std</code> Namespace for your own custom types only
 +
::<syntaxhighlight lang="cpp" line>
 +
namespace std
 +
{
 +
    template <> struct hash<Vei2>
 +
    {
 +
        size_t operator()( cont Vei2& vec ) const
 +
        {...}
 +
    };
 +
}
 +
</syntaxhighlight>
 +
:* Now you don't need to pass <code>HashVec2</code> in the map definition
 +
</div>
 +
* The <code>std::unordered_map<></code> Bucket interface [https://youtu.be/LsjFAx-dG5I?t=20m00s 20:00]
 +
<div class="mw-collapsible-content">
 +
:* Allows you to get information about the buckets in the hash table and access nodes
 +
:* The bucket iterator takes an index of the bucket and allows you to iterate over all the elements in that specific bucket
 +
</div>
 +
* The <code>std::unordered_map<></code> Hash policy interface [https://youtu.be/LsjFAx-dG5I?t=21m47s 21:47]
 +
<div class="mw-collapsible-content">
 +
:* Allows you to tune your hash table (and thus the growth behavior & performance of the map)
 +
:* Load Factor = average number of elements per bucket. For performance, you typically want to keep this below 1
 +
:* You can set the maximum load factor above which the table gets rehashed
 +
:* When the load factor becomes too high, it will automaticall rehash the table and increase the number of buckets
 +
:* You can manually rehash to a number of buckets you define
 +
:* You can reserve space for max number of elements, is then derives (and manages) the required number of buckets
 +
</div>
 +
* When to choose <code>std::map</code> over <code>std::unordered_map</code>? [https://youtu.be/LsjFAx-dG5I?t=24m15s 25:15]
 +
<div class="mw-collapsible-content">
 +
:* For simplicity and when performance is not a critical issue, no need to define a hash function;
 +
:* If you want to iterate in order;
 +
:* When you want to be able to find keys that are close to a certain key (with <code>lower_bound</code> and <code>upper_bount</code>
 +
</div>
 +
* Homework assignment [https://youtu.be/LsjFAx-dG5I?t=26m04s 26:04]
 +
</div>
 +
 
 +
== Homework Assignment ==
 +
 
 +
The homework for this video is to enable use of a custom datatype in <code>unordered_map</code> hashing over multiple (4) members of that datatype. The solution video is [https://youtu.be/9qiJytSz9iM here].
  
 
== Supplementary Link ==
 
== Supplementary Link ==

Latest revision as of 23:47, 2 February 2020

Associative containers are super useful, both as a convenient fast way to create dictionary or mapping for real-world problems like managing game resources, and as a data structure to help solve more abstract algorithmic computer science problems. And hash tables are fast as balls.

Topics Covered

Part 1: ordered associative containers

  • std::map container interface
  • Binary tree data structure
  • std::map key requirements (comparison)
  • std::map gotchas (std::remove_if and const keys)
  • std::set
  • std::multimap and std::multiset

Part 2: unordered associative containers

  • Hash table performance vs. binary tree performance
  • Hash table data structure
  • std::unordered_map key requirements
  • Hash combining
  • std::unordered_map bucket interface and hashing policy
  • When to choose std::map over std::unordered_map

Video Timestamp Index

Tutorial 24.1: The ordered associative containers


  • The std::map<KeyType,ValueType> class 0:46
    • Maps consist of keys to lookup (associated) values
    • map.insert( {key,value} ) to insert (key,value) pairs
    • map[key] returns a reference to the ValueType for a KeyType
  • A Binary Tree data structure is used to manage the order of map elements 2:46
    • std::map performs lookup in O(log(n)), it uses a Binary tree data structure
    • Key properties of a Binary Tree (BT):
    - Nodes can have at most 2 children (hence: binary)
    - Each left child is smaller and each right child is larger than its parent
    - Insertion is done by navigating the tree along a route Left for smaller, Right for larger such that the order property always holds
    • The big advantage of the BT properties is that retrieval is very fast
    • The beauty of std::map is that we don't have to implement any of this; it's all there in the STL 7:00
    • The STL implementation is further optimized, e.g. it uses a red-black tree for BT rebalancing
  • A look at the std::map cppreference.com documentation: insert, lookup & find 7:35
    • map.insert() takes a pair type std::pair<KeyType,ValueType>, the Map's elements
    • C++ can deduce the pair Type, so map.insert({keyX,valueXYZ}); with curly braces will do the job
    • An even better way to insert is through map.emplace() operation; it will construct the pair in-place.
    • For lookup, you can use square braces, map[x] will return a reference to the corresponding value
    • Note: a lookup with a new key value will create that element in the map with the default constructed ValueType value
    • insert or emplace with a key that already exists will NOT override the existing value: std::map::emplace returns a std::pair<iterator,bool> where the bool inidicates whether an insertion took place
    • map.find("xyz") returns an iterator to the element if it exitst, and an iterator to map.end() if it doesn't exist (useful to check if a key already exists)
    • std::map comes with iterators and because it is a sorted map, when you iterate over its elements with for (auto& el : map), it will be in order (of the keys)
  • Requirements on KeyType 14:30
    • The KeyType has to be comparable. The third template parameter is a functor for KeyType Comparison that defaults to std::less<KeyType>
    • So by default keys have to implement the "less than" comparison operator or provide your own comparison functor when defining the map
  • std::map cppreference.com documentation continued: erase 15:28
    • std::map::erase offers three basic ways to erase elements:
    - With an iterator; returns an iterator following the last removed element
    - With an iterator range, idem
    - By key through map.erase(const KeyType& key); this operation returns the number of elements erased (in size_type)
  • Two important things to know when working with associative containers 16:04
    • std::remove_if does not work with associative containers (will come with C++20).
    - You have to iterate over the elements with for( auto i = map.begin(); i != map.end();)
    - And apply i = map.erase(i); in the body of your if logic, and ++i in the else block.
    • Keys are const. You're not allowed to modify the keys 18:38
    - Makes sense: the keys define the structure of the binary tree.
    - If you modify the key you invalidate this structure (it would require a deletion and insertion to do it properly)
  • The std::set<KeyType> class 20:00
    • With a set, you only have keys, and a unique entry for each unique key
    • Use case: ensure that there are no duplicates in a set
  • The std::multimap and std::multiset classes 21:28
    • Map has unique keys, with multimap you can insert multiple elements with the same key
    • This enables operations like std::multimap::equal_range that returns a pair of iterators (begin and end) of the range where these elements have that same key
    • std::multimap::count will return the number of elements with specific key
  • Practical example of a multimap use case 22:30
    • Implementation example of a custom Comparison functor for the Vei2 class (2D coordinate vector).
    - Chili's choice for ordering (used in the body of the functor):
    - return (lhs.x == rhs.x) ? lhs.y < rhs.y : lhs.x < rhs.x;
    • Example of how to find and print multiple elements in a multimap using equal_range()
  • Lookup in multimaps 25:21
    • Note: the multimap class does not have an index operator []
    • When you do a lookup on a multimap, you should use equal_range()
    • The problem with find() on a multimap, is that if there are several elements with key in the ccontainer, any of them may be returned

Tutorial 24.2: The unordered associative containers


  • Main difference between ordered/unordered: performance 0:14
  • Implication: if you iterative over an unordered container, keys will appear in (seemingly) random order
  • Releasing the ordering requirement makes it possible to use a hash table with performance advantages: O(1) contant time insertion and lookup
  • Using an unordered map 1:38
  • The interface is pretty much the same as its ordered counterpart
  • Include <unordered_map>, declare using std::unordered_map<KeyType,ValueType>
  • You can initialize your map object with an initializer list if you wanted to using ({ {..,..},{..,..},... }) inside your declaration
  • The Hash Table data structure 3:20
  • A hash table allows you to get the quick access to values, comparable to array access using the index, but with efficient memory usage
  • Buckets are used to group keys; this is done by mapping keys to buckets using a hash function (a.k.a. hashing)
  • Multiple keys can map to the same bucket in a hash table ("collision"). We use a linked list to store multiple {key,value} pairs in a bucket
  • Two ways to minimize hash collisions: i) more buckets, ii) smart hash function that distributes key values uniformly across your bucket space
  • Hashing a a two step process 9:26:
- A hash function takes in the KeyType input (typically a string or int) and outputs a size_t
- the size_t output is reduced/ditributed to the size of the hash table (number of buckets)
  • The Standard Library provides general hashing functions for all the standard types
  • For general use of unordered maps, we don't have to worry about the technical details of how the hash table works, the STL provides this
  • Requirements for the KeyType of an unordered_map / a hash table 11:56
  • There needs to be a working hash function defined for the KeyType
  • There need to be comparison and equality functor definitions for the KeyType
  • Example: map from Vec2 class (2D coordinates) to a string 12:46
  • In order to make this work, you need to define a hash function and the comparators for Vei2
  • You can implement a comparison/equality functor as a struct that defines a operator() member function, templated on T
struct EqVec2
{
    template <typename T>
    bool operator()( const T& lhs,const T& rhs ) const
    {
        return (lhs.x == rhs.x) && (lhs.y == rhs.y);
    }
};
  • Defining a custom hashing function is an art, it requires knowledge of cryptography, abstract algebra, discrete math, etc.
  • Luckily, we don't need this; you can revert to the standard hashing functions for the basic types that make up any custom type
  • Combining hashes from basic types to create a hash over your custom object
  • A simple google search will give you good examples of how to combine hash values in C++
  • You can implement a hashing functor as a struct that defines a member function, templated on T, the basic type of the Vec2 coordinates:
struct HashVec2
{
    template <typename T>
    size_t operator()( const _Vec2<T>& vec ) const
    {
        std::hash<T> hasher;
        auto hashx = hasher ( vec.x );
        auto hashy = hasher ( vec.y );
        hashx ^= hashy + 0x9e3779b9 + (hashx << 6) + (hashx >> 2);
        return hashx;
    }
};
  • You pass this functors when defining the map: std::unordered_map<Vei2,std::string,HashVec2> map; 17:15.
  • Note that the comparison functor is not needed: we can revert back to the equality operator already defined in the Vec2 class definition
  • Template Specialization 18:43
  • Unordered map uses std::hash by default. You can inject Template Specialization for std::hash into the std Namespace for your own custom types only
namespace std
{
    template <> struct hash<Vei2>
    {
        size_t operator()( cont Vei2& vec ) const
        {...}
    };
}
  • Now you don't need to pass HashVec2 in the map definition
  • The std::unordered_map<> Bucket interface 20:00
  • Allows you to get information about the buckets in the hash table and access nodes
  • The bucket iterator takes an index of the bucket and allows you to iterate over all the elements in that specific bucket
  • The std::unordered_map<> Hash policy interface 21:47
  • Allows you to tune your hash table (and thus the growth behavior & performance of the map)
  • Load Factor = average number of elements per bucket. For performance, you typically want to keep this below 1
  • You can set the maximum load factor above which the table gets rehashed
  • When the load factor becomes too high, it will automaticall rehash the table and increase the number of buckets
  • You can manually rehash to a number of buckets you define
  • You can reserve space for max number of elements, is then derives (and manages) the required number of buckets
  • When to choose std::map over std::unordered_map? 25:15
  • For simplicity and when performance is not a critical issue, no need to define a hash function;
  • If you want to iterate in order;
  • When you want to be able to find keys that are close to a certain key (with lower_bound and upper_bount
  • Homework assignment 26:04

Homework Assignment

The homework for this video is to enable use of a custom datatype in unordered_map hashing over multiple (4) members of that datatype. The solution video is here.

Supplementary Link

See also