CMPT 130 Assignment 5 – Anagrams

 

For this assignment you are to write a program that writes the permutations and anagrams of words to output files. As usual, your program will be marked by compiling it using g++ on Linux; therefore you should test your program with this compiler and in the Linux OS before submitting it. If you fail to do this, and if your program does not compile it will not be marked.

 

Please make sure that you read the entire assignment carefully and that your solution exactly follows the requirements.

 

This assignment is worth 5% of your final grade.

 

Requirements

 

Your program should prompt the user to enter the name of a file. The program should then read this file which should consist of English words. The words in the file should then be printed (to the screen) one word per line in the order in which they appear in the input file. See below for further details on the input file.

 

For each word in the input file the program should produce an output file with a .txt extension whose name is the same as the word. For example, if the word was cinema the output file should be called cinema.txt.

 

An output file should contain a list of all of the permutations of the input word (including the word itself) and a list of all of the anagrams of the input word (including the word itself). Both lists should be sorted in lexicographic order (the order in which words appear in a dictionary) independently of each other. See the section below for further details on the output file. You must write your own function to generate permutations – you cannot use the STL next_permutation function.

 

Input Files

The input file should consist of a number of words with one word appearing on each line. For example:

bat

cinema

dragon

abacus

 

If the file name entered by the user does not exist or is not found (because it is not in the current directory) the program should prompt the user to enter another file name. This process should be repeated either until a valid file name is given or the user indicates that they wish to quit. If the user chooses to quit, the program should terminate without performing further actions. You are not otherwise responsible for handling file errors.

 

If the file contains strings that are not words (such as flibble or er1c) these should be processed in the same way as other strings – that is, you should generate their permutations and anagrams. We will not test your program with files where the text includes spaces, so you are not responsible for handling this type of input. We will also not test your program with files that contain upper case letters. This would not make any difference to the computation of permutations but might result in errors in finding anagrams.

 

Output Files

As stated previously, a separate output file should be generated for each word (or string) in the input file. If the input file consisted of the four words shown above then the program should create four files named bat.txt, cinema.txt, dragon.txt and abacus.txt. These files should be created in the same directory as the program. If the input file contains the same word more than once only one output file should be created for it. Note that this does not actually entail any additional work if you simply let your program overwrite any previously created files with the same name. This isn't a particularly efficient solution to this problem, but you won't lose marks if you take this approach.

 

Output files should be formatted as follows, where the length of the input word is n. The first n! lines (less than n! if the input word contains repeated letters) of the output file should consist of the permutations of the input word, with one permutation per line, with the words appearing in lexicographic order. The line immediately following the permutations should consist of a single asterisk character (*). The remaining lines of the file should consist of the anagrams of the input word, again one anagram per line, with the words appearing in lexicographic order. For example, if the input word was bat the output file (named bat.txt) should contain the following:

abt

atb

bat

bta

tab

tba

*

bat

tab

 

Duplicate Letters

If the input word includes repeated letters duplicate permutations should not be included in the output file. For example, if the input word was eel the output file (named eel.txt) should contain the following:

eel

ele

lee

*

eel

lee

 

I would suggest implementing this part of the assignment last. Also, please look at the assessment of the assignment.

 

Summary of Requirements

1.      Request name of input file from user until user gives valid file name or quits

2.      Print contents of input file, one word per line

3.      For each input word

4.      - write sorted list of permutations to an output file with .txt extension and same name as input word

5.      - write * to output file

6.      - write sorted list of anagrams to output file

7.      Repeat for next word

 

Notes

 

Permutations

A permutation is a re-arrangement of the elements of a list that retains all of those elements. This assignment requires you to generate all the permutations of a given word (or words). The number of possible permutations of a word is the factorial of the number of letters in the word (including the original word). There are therefore 3! or 6 (from 3*2*1) permutations of the word bat.

bat bta abt atb tab tba

 

You can see that most of these permutations are not English words. Which brings us to …

 

Anagrams

An anagram is a word or phrase formed by re-arranging the letters in another word or phrase. For our purposes we are going to ignore the “or phrase” part of the definition and limit our anagrams to single words. The difference between an anagram and a permutation is that an anagram must be a word – in our case, an English word. The anagrams of the word bat are (again including the original word):

bat tab

 

An obvious question is: how do you check whether or not a string is an English word?

 

Determining if a String is a Word

To determine whether a string is a word you should check to see if it is contained in this list of English words. Save the list as a file which you must name wordlist.txt. Then read the file into a vector (or array) and search for the target word.

 

Vectors

You may use vectors in this assignment and I would recommend doing so as they will make your life much easier, particularly when it comes to reading the input file.

 

Sorting

This assignment requires you to sort data at various times (hint: finding duplicates is a lot easier if you sort the list first). You may use the STL sort algorithm, and I would recommend that you do so. You are of course welcome to implement your own sorting algorithm but doing so will not result in you earning additional marks (though it is good practice).

 

Testing

You will need to test your program with input files. It is up to you to make your own input files containing lists of words.

 

Assessment

The assignment is out of 44.  Marks are assigned as follows:

§  Reading and printing the input file – 4

§  Handling file input errors – 4

§  Creating the correctly named output files (regardless of contents) – 4

§  Generating the correct permutations – 8

§  Generating the correct anagrams – 4

§  Omitting duplicated permutations and anagrams of words with repeated letters – 4

§  Function design1 – 8

§  Variable and function naming – 4

§  Comments – 2

§  Indentation – 2

 

Note 1 –the assignment description does not tell you exactly which functions to write – however you are responsible for decomposing your program into functions, which should have input parameters and return values as appropriate.

 

A Note on Academic Dishonesty

You may be able to find solutions to some (or even all) of the problems you have to solve for this assignment on the web. If you uses such sources for research you should cite them prominently at the top of your file. If your implementation of some function or functions is significantly the same as one of these sources (i.e. you copied it) you should note this at the top of your file and above the function definition. We may (and probably will) deduct marks for functions that we deem you have not substantially implemented yourself, but if you have cited the source as noted here we will not accuse you of academic dishonesty.

 

Using other people’s work to implement the assignment without any citation is, of course, academic dishonesty and will be treated as such.

 

Submission

You should submit your assignment online to the CoursSys submission server.  You must submit a single .cpp file, please read the documentation on site for further information. 

 

The assignment is due by 11:59 pm on Monday the 3rd of December.

 

 

CMPT 130 Home

 

John Edgar (johnwill@sfu.ca)