Assignment 5: Spell Checking

Your task in this assignment is to implement a spell checker based on an open address hash table using quadratic hashing to resolve collisions.

Implement this as a command-line program where the user can type in various commands and get immediate results. Here’s a sample run of how it might work when it’s done:

$ ./a5
Welcome to SpellCheck Master!

> check apple
ok

> check bapple
don't know "bapple"

> suggest bapple
Suggested words:
  apple
  dapple

> add bapple
"bapple" added to the dictionary

> check bapple
ok

> remove bapple
"bapple" removed

> checkfile story.txt
// a list of all mis-spelled words, and the line they appear on
// are printed

Please call your spell checker some cool name other than “SpellCheck Master” — that name’s taken.

Implementation Constraints

Please follow these rules for your implementation:

  • The only files you can #include are the ones lists in a5.cpp below.

  • Your program should assume that all words are stored the two text files are in the same folder as the program:

    • maindict.txt is the main dictionary of words.

      You should assume that this file contains tens of thousands of words, one word per line. The words are not in any particular order.

    • addedwords.txt are words added by the user with the add command. Since they are stored in a file, user-added words will persist between runs of your program.

  • Store your words in a (lovingly) hand-crafted hash table with the following properties:

    • it’s implemented as a class that hides its implementation details
    • the underlying table is an array
    • it uses open addressing
    • collisions are resolved by quadratic hashing
    • adding a string, finding a string, and removing a string are all fast O(1) operations on average; the performance of all the commands should feel instantaneous to the user
    • automatically re-sizes itself (e.g. doubles the capacity of the unerlying array) when the load factor is above 0.5; this requires re-hashing all the values, and so it’s okay if operations for when this happens take much more time than when re-hashing is not done

Commands for Your Program

For all commands, you can, if you wish, make a few simplifying assumptions:

  • strings entered by the user won’t have spaces, or be empty
  • case matters, i.e. apple and Apple can be treated as different strings
  • you don’t need to do any special handling for punctuation or other non-letter characters
  • the case of the command itself matters, i.e. check and Check are not the same

Here are the commands your program should implement:

  • check s: Prints “ok” if string s is a word in the dictionary, and don't know "s" otherwise.

    This checking should be instantaneous from the user’s point of view.

  • add s: Adds string s to addedwords.txt. If s is already maindict.txt or addedwords.txt, then this does nothing (so there should not be multiple copies of words in either file).

    Note that you don’t necessarily need to add s to addedwords.txt immediately. You could, for instance, wait until the program ends to add all user-added words in one big batch.

    Since user-added words are saved in a file, the next time the program runs they will be treated as correct spelled words.

  • remove s: Removes s from addedwords.txt, and print a message like “s removed”. If s is in maindict.txt, then print a message like “can’t remove s: it’s in the main dictionary”. If s is in neither maindict.txt nor addedwords.txt, then print a message like “can’t remove s: unknown word”.

  • checkfile s: Opens a text file named s, and finds all misspelled words (i.e words in neither maindict.txt nor addedwords.txt) and lists them in the order they are found. Each misspelled word should be put on its own line and include the line number in the file where it was found.

    If s is not the name of a valid file, then print a message like “can’t find file s”.

  • suggest s: Gives a list of correctly spelled words similar to s. Usually s is misspelled, but it’s certainly possible that s is a known word. For known words, you should still give a list of suggestions.

    The details of exactly what counts as a similar word, and how you can find similar words, are up to you. Do a bit of research on the web to find out how simple spelling suggestions can be made.

    For some words, there might only a few (or even 0) suggestions, and it other cases there may be a lot of suggestions. You probably don’t want to show the user too many possible suggests.

  • quit or stop or end or done: Ends the program.

If the user types an unknown command, then print a helpful error message like “unknown command”.

You are welcome to add other commands. Just make sure the commands above are implemented as described, since they will be the ones tested by the marker.

What to Submit

Submit just a file called a5.cpp that contains all your code for this assignment. It must compilable with the course makefile, and when ./a5 is run at the command line it should read maindict.txt and addedwords.txt and then give the user a prompt to enter commands.

See Canvas for the exact due date and marking scheme.

  • Use valgrind to help ensure your program has no memory leaks or other memory errors, e.g.:

    $ make -B a5
    
    $ valgrind ./a5
    
    // ... lots of output ...
    

    A program is considered to have no memory leaks or problems if:

    • In the LEAK SUMMARY, definitely lost, indirectly lost, and possibly lost must all be 0.
    • The ERROR SUMMARY must report 0 errors.
    • It is usually okay if still reachable reports a non-zero number of bytes.
  • If you are using dynamic memory, use new and delete to allocate and de-allocate memory. Do not use malloc and free.

  • Do not use C++ smart pointers, such as unique_ptr or shared_ptr. Stick to raw pointers, and use new and delete (or delete[]) and constructors/destructors (plus valgrind) to manage memory.

  • You must include this statement of originality comment at the start of a5.cpp (see below). If your a5.cpp does not have this, then we will assume your work is not original and it will not be marked.

If your program meets all these basic requirements, then it will graded using the marking scheme on Canvas.

a5.cpp

// a5.cpp

/////////////////////////////////////////////////////////////////////////
//
// Student Info
// ------------
//
// Name : <put your full name here!>
// St.# : <put your full SFU student number here>
// Email: <put your SFU email address here>
//
//
// Statement of Originality
// ------------------------
//
// All the code and comments below are my own original work. For any non-
// original work, I have provided citations in the comments with enough
// detail so that someone can see the exact source and extent of the
// borrowed work.
//
// In addition, I have not shared this work with anyone else, and I have
// not seen solutions from other students, tutors, websites, books,
// etc.
//
/////////////////////////////////////////////////////////////////////////

//
// You are only allowed to include these files --- don't include anything else!
// If you think it's necessary to include another file, please speak to the
// instructor.
//

 #include "cmpt_error.h"
 #include <iostream>
 #include <string>
 #include <fstream>   // for reading/writing text files
 #include <sstream>   // for using string streams

 // ...