CMPT
130 Assignment 5 Anagrams
For
this assignment you are to write a program that writes the permutations and
anagrams of words to output files. As usual, your program will be marked by
compiling it using g++ on Linux;
therefore you should test your program with this compiler and in the Linux OS
before submitting it. If you fail to do this, and if your program does not compile it will not be marked.
Please
make sure that you read the entire assignment carefully and that your solution
exactly follows the requirements.
This
assignment is worth 5% of your final grade.
Requirements
Your program should prompt the user to enter
the name of a file. The program should then read this file which should consist
of English words. The words in the file should then be printed (to the screen)
one word per line in the order in which they appear in the input file. See
below for further details on the input file.
For each word in the input file the program
should produce an output file with a .txt
extension whose name is the same as the word. For example, if the word was cinema the output file should be called cinema.txt.
An output file should contain a list of all of the permutations of the input word (including the
word itself) and a list of all of the anagrams of the input word (including the
word itself). Both lists should be sorted in lexicographic order (the order in
which words appear in a dictionary) independently of each other. See the
section below for further details on the output file. You must write your own
function to generate permutations you
cannot use the STL next_permutation
function.
Input Files
The input file should consist of a number of
words with one word appearing on each line. For example:
bat
cinema
dragon
abacus
If the file name entered by the user does not
exist or is not found (because it is not in the current directory) the program
should prompt the user to enter another file name. This process should be
repeated either until a valid file name is given or the user indicates that
they wish to quit. If the user chooses to quit, the program should terminate
without performing further actions. You are not otherwise responsible for
handling file errors.
If the file contains strings that are not
words (such as flibble
or er1c) these should be processed in
the same way as other strings that is, you should generate their permutations
and anagrams. We will not test your program with files where the text includes
spaces, so you are not responsible for handling this type of input. We will
also not test your program with files that contain upper case letters. This
would not make any difference to the computation of permutations but might
result in errors in finding anagrams.
Output Files
As stated previously, a separate output file
should be generated for each word (or string) in the input file. If the input file
consisted of the four words shown above then the program should create four
files named bat.txt, cinema.txt, dragon.txt and abacus.txt.
These files should be created in the same directory as the program. If the
input file contains the same word more than once only one output file should be
created for it. Note that this does not actually entail any additional work if
you simply let your program overwrite any previously created files with the
same name. This isn't a particularly efficient solution to this problem, but
you won't lose marks if you take this approach.
Output files should be formatted as follows,
where the length of the input word is n.
The first n! lines (less than n! if the input word contains repeated
letters) of the output file should consist of the permutations of the input
word, with one permutation per line, with the words appearing in lexicographic
order. The line immediately following the permutations should consist of a
single asterisk character (*). The remaining lines of the file should consist
of the anagrams of the input word, again one anagram per line, with the words
appearing in lexicographic order. For example, if the input word was bat the output file (named bat.txt) should contain the following:
abt
atb
bat
bta
tab
tba
*
bat
tab
Duplicate
Letters
If the input word includes repeated letters
duplicate permutations should not be
included in the output file. For example, if the input word was eel the output file (named eel.txt) should contain the following:
eel
ele
lee
*
eel
lee
I would
suggest implementing this part of the assignment last. Also, please look at the
assessment of the assignment.
Summary
of Requirements
1. Request name of input file from user until user gives valid file name
or quits
2. Print contents of input file, one word per line
3. For each input word
4. - write sorted list of permutations to an output file with .txt
extension and same name as input word
5. - write * to output file
6. - write sorted list of anagrams to output file
7. Repeat for next word
Notes
Permutations
A permutation is a re-arrangement of the
elements of a list that retains all of those elements. This assignment requires
you to generate all the permutations of a given word (or words). The number of
possible permutations of a word is the factorial of the number of letters in
the word (including the original word). There are therefore 3! or 6 (from
3*2*1) permutations of the word bat.
bat bta abt atb
tab tba
You can see that most of these permutations
are not English words. Which brings us to
Anagrams
An anagram is a word or phrase formed by
re-arranging the letters in another word or phrase. For our purposes we are
going to ignore the or phrase part of the definition and limit our anagrams
to single words. The difference between an anagram and a permutation is that an
anagram must be a word in our case, an English word. The anagrams of the word
bat are (again including the original
word):
bat tab
An obvious question is: how do you check whether or not a string is an English word?
Determining if a String is a Word
To determine whether a string is a word you
should check to see if it is contained in this list
of English words. Save the list as a file which you must name wordlist.txt.
Then read the file into a vector (or array) and search for the target word.
Vectors
You may use vectors in this assignment and I
would recommend doing so as they will make your life much easier, particularly
when it comes to reading the input file.
Sorting
This assignment requires you to sort data at
various times (hint: finding duplicates is a lot easier if you sort the list
first). You may use the STL sort
algorithm, and I would recommend that you do so. You are of course welcome to
implement your own sorting algorithm but doing so will not result in you
earning additional marks (though it is good practice).
Testing
You will need to test your program with input
files. It is up to you to make your own input files containing lists of words.
Assessment
The assignment is out of 44. Marks are
assigned as follows:
§ Reading and
printing the input file 4
§ Handling file
input errors 4
§ Creating the
correctly named output files (regardless of contents) 4
§ Generating the
correct permutations 8
§ Generating the correct
anagrams 4
§ Omitting
duplicated permutations and anagrams of words with repeated letters 4
§ Function design1
8
§ Variable and
function naming 4
§ Comments 2
§ Indentation 2
Note
1 the assignment description does not tell you exactly which functions to
write however you are responsible for decomposing your program into
functions, which should have input parameters and return values as appropriate.
A Note on Academic Dishonesty
You
may be able to find solutions to some (or even all) of the problems you have to solve for this assignment on the web. If you uses
such sources for research you should cite them prominently at the top of your
file. If your implementation of some function or functions is significantly the
same as one of these sources (i.e. you copied it) you should note this at the
top of your file and above the function definition. We may (and probably will)
deduct marks for functions that we deem you have not substantially implemented
yourself, but if you have cited the source as noted here we will not accuse you
of academic dishonesty.
Using other peoples work to implement the
assignment without any citation is, of course, academic dishonesty and will be
treated as such.
Submission
You should submit your assignment online to
the CoursSys submission server.
You must submit a single .cpp file, please read the documentation on site for further
information.
The assignment is due by 11:59 pm on Monday
the 3rd of December.
John Edgar (johnwill@sfu.ca)