CMPT 225 Assignment 5: Solving problems using unordered_map

CMPT 225 Assignment 5: Solving problems using unordered_map

Due Sunday Nov 26th, 2017, 23:59 pm


Introduction

STL unordered_map is based on hash tables and therefore can insert, find, and erase key-value pairs in expected and amortized time complexity of O(1). This very desirable time complexity makes this container very useful. In this assignment, you will be using unordered_map to solve two problems and also will change a class in a way that it can be used as key type in an unordered_map. You may want to do your lab7 before starting to work on this assignment.

Utility.h, Unility.cpp

Throughout the assignment, whenever you needed a class or function that you felt like it could be used in more than one problem/file, you can declare them in Utility.h and implement them in Utility.cpp. We have already put class n_gram and function Tokenize in those files and used them in ngram.cpp for step1.

Steps

There are three steps in this assignments and they are independent of each other:
  1. 40% Implement operator== for n_gram class and define and implement a hash code class n_gram_hash for it so that ngram.cpp compiles and runs. If it runs correctly, it will produce unigrams.txt and bigrams.txt, two files that will list unigrams (words) and bigrams (pairs of consequent words) in corpus.txt along with their frequencies. You may find lab7 useful while completing this step. You must declare n_gram_hash class in Utility.h and implement it in Utility.cpp. You can compile and run ngram.cpp as follows:
    g++ -c Utility.cpp
    g++ --std=c++0x -o ngram.o ngram.cpp Utility.o
    ./ngram.o
    
  2. 30% Write a C++ program that takes a file containing n pairs of strings as input, and tells us which pair appeared together most often in O(n). If there are more than one pair that co-occur most often, output any of them. The order in the pair doesn't matter (for example (str1,str2) is the same as (str2, str1)). Each line in the input file contains a pair of strings separated by a tab character ('\t'). The program MUST compile and run as follows:
    g++ -c Utility.cpp 
    g++ --std=c++0x -o MostOftenPair.o MostOftenPair.cpp Utility.o
    ./MostOftenPair.o MostOftenPair_input.txt
    Str2 and Str3 occurred together most often: #occurrence=3
    
  3. 30% Write a C++ program that takes a vector of n integers and an integer S as input and outputs the indices of a pair of items in V that sum to S. If such a pair doesn't exist, the program announces (-1,-1) as the answer. The program must be of O(n) and MUST compile and run as follows:
    g++ -c Utility.cpp 
    g++ --std=c++0x -o PairSum.o PairSum.cpp Utility.o
    ./PairSum.o "{3,5,0,10,-5}" 13
    answer: (0,3)
    ./PairSum.o "{3,5,0,10,-5}" 23
    answer: (-1,-1)
    

What to submit?

  1. Utility.h
  2. Utility.cpp
  3. MostOftenPair.cpp
  4. PairSum.cpp

Where to submit?

In CourSys under CMPT 225 Assignment5 activity.

Final Note

While it is OK for you to discuss the high-level idea of how to tackle each of these steps among yourselves, I encourage you to try coming up with the idea yourself. Remember, one of the most important goals of the course is to build your problem-solving skills.