CMPT 225 Lab 7: String hashing and unordered_map

CMPT 225 Lab 7: String hashing and unordered_map


Introduction

In this lab we will see implementations of the hash codes we talked about in class and we will see how those hash codes could be used in a C++ unordered_map.

C++ unordered_map is based on hash tables. Therefore, it assumes there is a hash code function for the key type you're using it with. If such a hash code doesn't exist, one should define one.

To test this, try compiling the following code:

#include <unordered_map>

using namespace std;

int main(){

        unordered_map<pair<int,int>,int> count;

}

It shouldn't compile. Now, verify that the following code compiles and runs (nothing happens when you run it though. ):

#include <unordered_map>

using namespace std;


class pair_hash{
public:
        std::size_t operator() (pair<int,int> p) const { return p.first + p.second; }
};


int main(){

        unordered_map<pair<int,int>, int, pair_hash> count;

}

The reason this code runs is that we have provided a class that overloads operator() to compute the hash code for the key type and introduced it in the unordered_map.

Lab activity

Please download this folder. The sum of components, polynomial, and cyclic shift hash codes that we mentioned in class (see the slides for lecture 25) are declared in HashUtility.h and partially implemented in HashUtility.cpp. test_unordered_map.cpp and test_hash_codes.cpp are provided to test the functionality of hash codes.

Looking at the class declarations and implementations, we can see that sum_of_components_hash is pretty simple. It doesn't overload the default constructor, doesn't have any other constructor, doesn't have any member data, and overloads one and only one operator: operator(). You should implement operator() of polynomial_hash in a similar fashion. If you're in doubt what polynomial_hash must do, find its definition in the slides of lecture 25.

The other two classes are a bit more complicated though. polynomial_hash has aprivate member data a_ and two public constructors: one that takes a value for a_ and one default construcor that randomly choses a non-zero value for a_. cyclic_hash too has a private member data sh_ and two public constructors: one that takes a value for sh_ and one default construcor that is not yet implemented. You should implement the default constructor for cyclic_hash so that it randomly choses a value from [0,31] for sh_.

How to test?

You should compile and run both test_unoredered_map.cpp and test_hash_codes.cpp to see if your code behaves sensibly. Here is what I get when running those codes:
g++ -c HashUtility.cpp 
g++ -std=c++0x -o test_unordered_map.o test_unordered_map.cpp HashUtility.o
./test_unordered_map.o 
chosen a: 1104777304
chosen sh: 24
{0,0,0,0,0,0,0,0,1} : 1
{0,0,0,0,0,0,0,1,0} : 1
{0,0,0,0,0,0,1,0,0} : 1
{0,0,0,0,0,1,0,0,0} : 1
{0,0,0,0,1,0,0,0,0} : 1
{0,0,0,1,0,0,0,0,0} : 1
{0,0,1,0,0,0,0,0,0} : 1
{0,1,0,0,0,0,0,0,0} : 1
{1,0,0,0,0,0,0,0,0} : 1
{0,0,0,-1,0,1,0,1,0} : 2
{0,0,0,1,0,1,0,0,0} : 1
{0,0,0,0,0,0,0,0,0} : 1
{0,0,0,0,0,0,0,0,1} : 1
{0,0,0,0,0,0,0,1,0} : 1
{0,0,0,0,0,0,1,0,0} : 1
{0,0,0,0,1,0,0,0,0} : 1
{0,0,0,-1,0,1,0,1,0} : 2
{0,0,0,0,0,1,0,0,0} : 1
{0,0,0,1,0,1,0,0,0} : 1
{0,0,0,1,0,0,0,0,0} : 1
{0,0,1,0,0,0,0,0,0} : 1
{0,1,0,0,0,0,0,0,0} : 1
{1,0,0,0,0,0,0,0,0} : 1
{0,0,0,0,0,0,0,0,0} : 1
{0,0,0,0,0,0,0,1,0} : 1
{0,0,0,0,0,1,0,0,0} : 1
{0,0,0,1,0,0,0,0,0} : 1
{0,1,0,0,0,0,0,0,0} : 1
{0,0,0,0,0,0,0,0,1} : 1
{0,0,0,0,0,0,1,0,0} : 1
{0,0,0,0,1,0,0,0,0} : 1
{0,0,1,0,0,0,0,0,0} : 1
{1,0,0,0,0,0,0,0,0} : 1
{0,0,0,0,0,0,0,0,0} : 1
{0,0,0,-1,0,1,0,1,0} : 2
{0,0,0,1,0,1,0,0,0} : 1

g++ --std=c++0x -o test_hash_codes.o test_hash_codes.cpp HashUtility.o
./test_hash_codes.o 
chosen a: 1110592526
chosen sh: 14
sum_of_components_hash({0,0,0,0,0,0,0,0,0}):	1032
polynomial_hash(a_=5)({0,0,0,0,0,0,0,0,0}):	244482064
polynomial_hash(a_=10)({0,0,0,0,0,0,0,0,0}):	18446744071618386637
polynomial_hash(random a_)({0,0,0,0,0,0,0,0,0}):	18446744073342876429
cyclic_hash(randome sh_)({0,0,0,0,0,0,0,0,0}):	1808251968
cyclic_hash(sh_=10)({0,0,0,0,0,0,0,0,0}):	3278666808
sum_of_components_hash({0,0,0,1,0,1,0,0,0}):	1034
polynomial_hash(a_=5)({0,0,0,1,0,1,0,0,0}):	293388314
polynomial_hash(a_=10)({0,0,0,1,0,1,0,0,0}):	18446744072844138829
polynomial_hash(random a_)({0,0,0,1,0,1,0,0,0}):	423046797
cyclic_hash(randome sh_)({0,0,0,1,0,1,0,0,0}):	1875360836
cyclic_hash(sh_=10)({0,0,0,1,0,1,0,0,0}):	3278683256
sum_of_components_hash({0,0,0,-1,0,1,0,1,0}):	1080
polynomial_hash(a_=5)({0,0,0,-1,0,1,0,1,0}):	701654960
polynomial_hash(a_=10)({0,0,0,-1,0,1,0,1,0}):	18446744071949806901
polynomial_hash(random a_)({0,0,0,-1,0,1,0,1,0}):	18446744072360515909
cyclic_hash(randome sh_)({0,0,0,-1,0,1,0,1,0}):	3199774083
cyclic_hash(sh_=10)({0,0,0,-1,0,1,0,1,0}):	3200052363
sum_of_components_hash({0,0,0,-1,0,1,0,1,0}):	1080
polynomial_hash(a_=5)({0,0,0,-1,0,1,0,1,0}):	701654960
polynomial_hash(a_=10)({0,0,0,-1,0,1,0,1,0}):	18446744071949806901
polynomial_hash(random a_)({0,0,0,-1,0,1,0,1,0}):	18446744072360515909
cyclic_hash(randome sh_)({0,0,0,-1,0,1,0,1,0}):	3199774083
cyclic_hash(sh_=10)({0,0,0,-1,0,1,0,1,0}):	3200052363
sum_of_components_hash({1,0,0,0,0,0,0,0,0}):	1033
polynomial_hash(a_=5)({1,0,0,0,0,0,0,0,0}):	18446744072389308117
polynomial_hash(a_=10)({1,0,0,0,0,0,0,0,0}):	18446744073187711693
polynomial_hash(random a_)({1,0,0,0,0,0,0,0,0}):	18446744072957917965
cyclic_hash(randome sh_)({1,0,0,0,0,0,0,0,0}):	1808268352
cyclic_hash(sh_=10)({1,0,0,0,0,0,0,0,0}):	3278667832
sum_of_components_hash({0,1,0,0,0,0,0,0,0}):	1033
polynomial_hash(a_=5)({0,1,0,0,0,0,0,0,0}):	697289117
polynomial_hash(a_=10)({0,1,0,0,0,0,0,0,0}):	673307341
polynomial_hash(random a_)({0,1,0,0,0,0,0,0,0}):	1765833485
cyclic_hash(randome sh_)({0,1,0,0,0,0,0,0,0}):	1808514112
cyclic_hash(sh_=10)({0,1,0,0,0,0,0,0,0}):	3282861112
sum_of_components_hash({0,0,1,0,0,0,0,0,0}):	1033
polynomial_hash(a_=5)({0,0,1,0,0,0,0,0,0}):	1465185189
polynomial_hash(a_=10)({0,0,1,0,0,0,0,0,0}):	18446744072934521549
polynomial_hash(random a_)({0,0,1,0,0,0,0,0,0}):	18446744072082283277
cyclic_hash(randome sh_)({0,0,1,0,0,0,0,0,0}):	1812446272
cyclic_hash(sh_=10)({0,0,1,0,0,0,0,0,0}):	3278666812
sum_of_components_hash({0,0,0,1,0,0,0,0,0}):	1033
polynomial_hash(a_=5)({0,0,0,1,0,0,0,0,0}):	293310189
polynomial_hash(a_=10)({0,0,0,1,0,0,0,0,0}):	18446744072834138829
polynomial_hash(random a_)({0,0,0,1,0,0,0,0,0}):	18446744073358914317
cyclic_hash(randome sh_)({0,0,0,1,0,0,0,0,0}):	1875360832
cyclic_hash(sh_=10)({0,0,0,1,0,0,0,0,0}):	3278683192
sum_of_components_hash({0,0,0,0,1,0,0,0,0}):	1033
polynomial_hash(a_=5)({0,0,0,0,1,0,0,0,0}):	246435189
polynomial_hash(a_=10)({0,0,0,0,1,0,0,0,0}):	18446744072618386637
polynomial_hash(random a_)({0,0,0,0,1,0,0,0,0}):	659523853
cyclic_hash(randome sh_)({0,0,0,0,1,0,0,0,0}):	2881993792
cyclic_hash(sh_=10)({0,0,0,0,1,0,0,0,0}):	3345775672
sum_of_components_hash({0,0,0,0,0,1,0,0,0}):	1033
polynomial_hash(a_=5)({0,0,0,0,0,1,0,0,0}):	244560189
polynomial_hash(a_=10)({0,0,0,0,0,1,0,0,0}):	18446744071628386637
polynomial_hash(random a_)({0,0,0,0,0,1,0,0,0}):	407008909
cyclic_hash(randome sh_)({0,0,0,0,0,1,0,0,0}):	1808251972
cyclic_hash(sh_=10)({0,0,0,0,0,1,0,0,0}):	3278666872
sum_of_components_hash({0,0,0,0,0,0,1,0,0}):	1033
polynomial_hash(a_=5)({0,0,0,0,0,0,1,0,0}):	244485189
polynomial_hash(a_=10)({0,0,0,0,0,0,1,0,0}):	18446744071618486637
polynomial_hash(random a_)({0,0,0,0,0,0,1,0,0}):	1603071981
cyclic_hash(randome sh_)({0,0,0,0,0,0,1,0,0}):	1808252032
cyclic_hash(sh_=10)({0,0,0,0,0,0,1,0,0}):	3278928952
sum_of_components_hash({0,0,0,0,0,0,0,1,0}):	1033
polynomial_hash(a_=5)({0,0,0,0,0,0,0,1,0}):	244482189
polynomial_hash(a_=10)({0,0,0,0,0,0,0,1,0}):	18446744071618387637
polynomial_hash(random a_)({0,0,0,0,0,0,0,1,0}):	1229690309
cyclic_hash(randome sh_)({0,0,0,0,0,0,0,1,0}):	1808252992
cyclic_hash(sh_=10)({0,0,0,0,0,0,0,1,0}):	57441337
sum_of_components_hash({0,0,0,0,0,0,0,0,1}):	1033
polynomial_hash(a_=5)({0,0,0,0,0,0,0,0,1}):	244482069
polynomial_hash(a_=10)({0,0,0,0,0,0,0,0,1}):	18446744071618386647
polynomial_hash(random a_)({0,0,0,0,0,0,0,0,1}):	743917339
cyclic_hash(randome sh_)({0,0,0,0,0,0,0,0,1}):	1808268352
cyclic_hash(sh_=10)({0,0,0,0,0,0,0,0,1}):	3278667832

What to submit?

  1. The HashUtility.cpp that includes the implementation of operator() for polynomial_hash and the implementation of default constructor for cyclic_hash.

Where to submit?

In CourSys under CMPT 225 Lab7 activity.