Chapter 8 Notes

Please read chapter 8 of the textbook.

C-Style Strings

C++ is built on top of the C language, and almost all features of C are part of C++

C was designed to be a low-level language for implementing things like operating systems

C provides simple, low-level features that give the programmer great control over a program at the cost of being harder to use correctly

one of the goals of C++ is to add some easier-to-use features on top of C

strings are one such feature added by C++

C++ strings are a high-level, easy-to-use feature that for almost all programs are easier to use than C-style strings

C++ still supports C-style strings, because C++ supports pretty much all of C

and in some cases, C++ cannot avoid C-style strings, and so you will have to know a bit about them

but, when we use strings in C++, we mean the C++ string data type, and not low-level C string

C-style strings are little more that sequences of bytes that that end with a \0, and so we will see them when we discuss arrays

C++ strings are much more sophisticated (but still extremely efficient), and make use of many high-level C++ features

The Standard string Class

we’ve been using the C++ standard string class throughout this course

it’s pretty easy to use!

string is a data type, and a value of type string is called an object

we’ll discuss details of how a string object can be implemented in the next chapter

for now, lets understand how to use strings — they are one of the most useful and powerful data types in C++

#include <string>

int main() {
    string s = "The Jackal got caught.";
    cout << s << "\n";
}

Initializing a string

#include "cmpt_error.h"
#include <iostream>
#include <string>

using namespace std;

int main() {
    string a;           // a is the empty string, ""
    string b = "house";
    string c("chair");
    string d{"mouse"};  // C++11-style initialization

    cout << "a = \"" << a << "\"\n"
         << "b = \"" << b << "\"\n"
         << "c = \"" << c << "\"\n"
         << "d = \"" << d << "\"\n";
}

"house" is an example of a string literal

string literals in C++ are not of type string, unfortunately

string literals are instead an array-of-characters

this is to be backwards-compatible with C (C has no string type)

much of the time this is not an issue as C++ tries hard to correctly handle mixing C-style strings and string objects

notice how many ways you can initialize a string

you can use =

or () notation

or {} notation

{} notation is the newest notation, and it was added because it turns out the other two notations don’t work in all possible cases (!)

Initializing a string

another way to initialize a string is by making a copy of an existing string

string a = "apple";
string b = a;
string c(a);
string d{a};  // new in C++11

a, b, c, and d all have the value "apple"

they each have their own personal copies

you can also create a string that is a sequence of 0 or more copies of the same character

string bar(10, '-');
cout << bar << "\n"; // ----------
                     // 10 '-'' characters

Strings Know their Size

the size() member function tells you the number of characters in a string

string s = "Once upon a time ...\n";
cout << "s has " << s.size() << " characters\n";
// s has 21 characters

be careful counting escape characters, such as \n (newline) or \t (tab)

they are single characters even though they must be typed using two symbols

string s = "\n\n\t\\n";
cout << "s has " << s.size() << " characters\n";
// s has 5 characters

Concatenating Strings

use + or += to combine — concatenate — two or more strings

int main() {
    cout << "What is your first name? ";
    string name;
    cin >> name;

    string msg = "* Hi " + name + "! *\n";
    string bar(msg.size() - 1, '*');  // note the -1
    bar += '\n';

    cout << bar
         << msg
         << bar;
}
What is your first name? Vetruvious
******************
* Hi Vetruvious! *
******************

the code that draws this box is worth looking at in detail

Reading an Entire Line

we’ve seen throughout this course how to read a single word from cin, e.g.

string s;
cin >> s;  // s gets assigned the first word the user types

cin >> s skips whitespace

so if the user types “apple tree”

then s is assigned the string “apple”

but sometimes you want to read in the entire line, spaces and all

you can do that like this

cout << "Please enter a string: ";
string s;
getline(cin, s); // read everything the user types up to \n
cout << "s = \"" << s << "\"\n";

by default, getline uses '\n' as the end-of-input marker

but you can change that to be other characters if you like, e.g.

cout << "Please enter a string: ";
string s;
getline(cin, s, '?'); // read everything the user types up to ?
cout << "s = \"" << s << "\"\n";

Accessing the Characters of a string

[]-notation is used to access string characters

if s is a string containing n characters, then

s[0] is the first character of s

s[1] is the second character of s

s[2] is the third character of s

...

s[n - 2] is the second to last character of s

s[n - 1] is the last character of s

Accessing the Characters of a string

cout << "Please enter a string: ";
string s;
cin >> s;
for(int i = 0; i < s.size(); ++i) {
    cout << "s[" << i << "] = '" << s[i] << "'\n";
}

here’s a sample run

Please enter a string: butter
s[0] = 'b'
s[1] = 'u'
s[2] = 't'
s[3] = 't'
s[4] = 'e'
s[5] = 'r'

Accessing the Characters of a string

the following for-loop shows a basic loop structure for processing every character of a string

for(int i = 0; i < s.size(); ++i) {
    cout << s[i];
}

i starts at 0 because the s[0] is the first character of s

the last character of s is s[s.size() - 1]

thus i < s.size() is the correct condition

it’s important to know that s[s.size()] is not the last character of s

since the numbering of the characters in s starts at 0 (instead of 1), s[s.size() - 1] is the last character of s

this is a subtle point that often confuses new programmers

Accessing the Characters of a string

you can modify characters in a C++ string

for example, this code replaces all spaces in a string with '_' (underscore)

cout << "Please enter a string: ";
string s;
getline(cin, s);
for(int i = 0; i < s.size(); ++i) {
    if (s[i] == ' ') {
        s[i] = '_';
    }
}
cout << "s = \"" << s << "\"\n";

Square-bracket Notation

the []-notation is used with strings, vectors, and arrays (and thus C-style strings)

we’ll see it again in different situations

in general, if an object x is a sequence of n values, then x[i] is the item at index location i of x

in C++ (and C), x[0] is the first item of the sequence, and x[n - 1] is that last element

it’s also quite possible in C++ to create your own sequence-like objects that define their own []-bracket notation (we won’t get to that in this course)

Range Errors

a common mistake is to access characters in a string outside the legal range of index values, e.g.

string s = "cat";
cout << s[-1]   // -1 isn't a valid index; program keeps running (!)
     << s[3];   // 3 isn't a valid index; program keeps running (!)

unfortunately, C and C++ don’t catch these errors either at compile-time or run-time

they just return unknown values, or perhaps cause undefined behaviour

it’s also a common source of security problems in programs

many C programs, for instance, are infamous for suffering from buffer overflows that boil down to the fact that strings (and arrays) don’t care want index value you pass to them

Range Errors

it would not be hard for C or C++ to check that every string (or array) access is in-range

but that has a run-time cost that the designers of C and C++ felt was too high (keep in mind that C++ was created in a time when computers were generally much slower than they are today, and with much less memory)

if you do want range-checking, C++ strings (and vectors) let you use the at(i) member function instead of []-bracket notation

the at function does range-checking, and will catch indexing errors at run-time

string s = "cat";
cout << s.at(-1)   // -1 isn't a valid index; at throws a run-time error
     << s.at(3);   // 3 isn't a valid index; at throws a run-time error

Some Useful string Functions

suppose s and t are strings

s == t tests if s and t are the same length have the same characters in the same order

s != t tests if s and t are different

s < t tests if s comes before t lexicographically

s <= t tests if s comes before t lexicographically, or is equal to t

s > t tests if s comes after t lexicographically

s >= t tests if s comes after t lexicographically, or is equal to t

the term lexicographically is a more general version of the term alphabetical

lexicographical order is the same as alphabetic order when you are dealing with alphabetic letters only

but the characters of a string might include digits, punctuation, or hundreds of other non-letter symbols

Getting A Substring

the substr member function lets you extract a substring from a string

string s = "character";
string t = s.substr(3, 3);
cout << t; // "rac"

substr has many useful applications, and so you should know it, and how to use

see the table on page 483 for a few more functions that come with string

Writing Our Own string Functions

it’s instructive to write our own versions of string functions

this helps us understand how they can be implemented

and it gives us practice writing C++ code

here are some functions for string equality and inequality

bool equal(const string& s, const string& t) {
    if (s.size() != t.size()) return false;

    // at this point we know s and t are the same size
    for(int i = 0; i < s.size(); ++i) {
        if (s[i] != t[i]) {
            return false;
        }
    }
    return true;
}

bool not_equal(const string& s, const string& t) {
    return !equal(s, t);
}

notice that s and t are passed by constant reference

since they are passed by reference, no copy of the string is made, which is makes these functions efficient

they are const reference to tell the compile that these functions only read s and t and don’t modify them; this is very useful in larger programs, and it means equal and not_equal can be called by other const functions

Testing if a String Contains a Character

you can use a loop to test if a string contains a given character

bool contains(const string& s, char c) {
    for(int i = 0; i < s.size(); ++i) {
        if (s[i] == c) {
            return true;
        }
    }
    return false;
}

the loop iterates over each character of s

notice that the statement return false is outside of the loop — that’s important!

Testing if a String Contains a Character

contains returns either true or false

thus it is a boolean function

here’s a related function that returns the index of a character in a string

// Pre-condition:
//    none
// Post-condition:
//    returns an int i such that s[i] == c;
//    if c is not anywhere in s, then it returns -1
int index_of(const string& s, char c) {
    for(int i = 0; i < s.size(); ++i) {
        if (s[i] == c) {
            return i;
        }
    }
    return -1;
}

we use -1 to as the “char not found” return value

this is a common way to indicate a “not found” value for an index

this function can be used to write the contains function

bool contains(const string& s, char c) {
    return !(index_of(s, c) == -1);
}

Testing if a Character is a Vowel

the index_of and contains function has many uses

for example

bool is_vowel(char c) {
    return contains("aeiouAEIOU", c);
}

int vowel_count(const string& s) {
    int count = 0;
    for(int i = 0; i < s.size(); ++i) {
        if (is_vowel(s[i])) {
            count++;
        }
    }
    return count;
}

note that we could have written is_vowel using a big or-expression like this

if (c == 'a' || c == 'e' || c == 'i' ... )

but using contains makes it shorter and easier to read

Reversing a String

void swap(char& x, char& y) {
    char temp = x;
    x = y;
    y = temp;
}

void reverse(string& s) {  // note that s is passed by reference
    int a = 0;
    int b = s.size() - 1;
    while (a < b) {
        swap(s[a], s[b]);
        ++a;
        --b;
    }
}

// a palindrome is a string that is the same backwards and forwards
// e.g. "pop" and "racecar"
bool is_palindrome(const string& s) {
    string s_rev = s;
    reverse(s_rev);
    return s == s_rev;
}

Converting a string to an int

suppose you want to convert the string “5832” to the int 5832

one way to do that is to use C++’s standard stoi function (which is in the C++ string library), e.g.:

#include <string>

// ...

string s = "5832";
int i = stoi(s); // is 5832

lets write our own version of this function

// Pre-condition:
//    s is a string of digits that, when interpreted as a base 10
//    integer, can be represented as a non-negative int.
// Post-condition:
//    Returns the int value of integer represented by s.
// Example:
//    str_to_int("6789") returns the int 6789
int str_to_int(const string& s) {
    int pow = 1;
    int result = 0;
    for(int i = s.size() - 1; i >= 0; --i) {
        result += (s[i] - '0') * pow;
        pow *= 10;
    }
    return result;
}

notice that the pre-condition puts a number of important restrictions on s

the body of str_to_int does not bother to check for errors because it assumes the pre-condition is true

also note that the expression s[i] - '0' converts the digit s[i] into its correspond int value

e.g. '8' - '0' evaluates to the int 8

Vectors

C++ vectors are sequential collections of 0 or more objects

#include "cmpt_error.h"
#include <iostream>
#include <vector>

using namespace std;

int main() {
    // an empty vector of ints
    vector<int> ages;
    cout << ages.size() << "\n";  // 0

    // a vector of 4 strings
    vector<string> names = {"Bob", "Mary", "Zia", "Talia"};
    cout << names.size() << "\n";  // 4

    // a vector of 2 doubles
    vector<double> temps = {5.4, -2.2};
    cout << temps[1] << "\n";  // -2.2

    // a vector of 4 int vectors
    vector<vector<int>> table = {
        {1},
        {2, 3, 1},
        {6, 8, 9, 1},
        {7, 3}
    };
    cout << table.size() << "\n";  // 4
    cout << table[2][3] << "\n";   // 1
} // main

Vectors

you can read and write individual elements of a vector using []-bracket notation

vector<int> v = {6, 2, 5};
cout << v[0] << "\n"   // 6
     << v[1] << "\n"   // 2
     << v[2] << "\n";  // 5

v[0] = v[1] + v[2];
cout << v[0] << "\n"   // 7
     << v[1] << "\n"   // 2
     << v[2] << "\n";  // 5

v[2] = 8;
v[1] = 0;
cout << v[0] << "\n"   // 7
     << v[1] << "\n"   // 0
     << v[2] << "\n";  // 8

for an n-element vector v, v[0] is the first element and v[n - 1] is the last element

out-of-range index values, such as v[-1] or v[n] are errors, but C++ does not cause an error

it is up to the programmer to make sure they never access out-of-range values in a vector (or string, or array)

Vectors

you can add elements to a vector as they appear

the vector will automatically increase its size to hold the new element

#include "cmpt_error.h"
#include <iostream>
#include <vector>

using namespace std;

int main() {
    cout << "Please enter some words: ";
    string w;
    vector<string> words;
    while (cin >> w) {
        words.push_back(w); // appends w to the right end of words
    }

    cout << "You entered:\n";
    for(int i = 0; i < words.size(); ++i) {
        cout << "  " << words[i] << "\n";
    }
} // main

the statement words.push_back(w) adds the string w to the right end of words

words automatically increases its size to hold w

For-Each Loops

C++11 provides a new kind of for-loop specifically designed for iterating over vectors (and strings, and arrays)

in many cases, it is the easiest way to loop through a vector

#include "cmpt_error.h"
#include <iostream>
#include <algorithm>

using namespace std;

int main() {
    cout << "Please enter some words: ";
    vector<string> words;
    string w;
    while (cin >> w) {
        words.push_back(w);
    }

    sort(words.begin(), words.end());

    for(string s : words) {
        cout << s << "\n";
    }
} // main

this loop is an example of a for-each loop:

for(string s : words) {
    cout << s << "\n";
}

it avoids the need for an index variable

which simplifies the code

this variation is more efficient

for(const string& s : words) {
    cout << s << "\n";
}

this makes s refer to the corresponding string in words

in contrast to the original (using string s) that made a copy of the string

For-Each Loops

here’s an example that uses a for-each loop to iterate over the characters in a string

bool is_digit(char c) {
    return c >= '0' && c <= '9';
}

// Pre-condition:
//    none
// Post-condition:
//    returns true if s consists entirely of digits
bool is_int(const string& s) {
    for(char c : s) {
        if (!is_digit(c)) {
            return false;
        }
    }
    return true;
}

For-Each Loops

this function prints a vector of strings

void print(const vector<string>& v) {
    for(const string& s : v) {
        cout << s << " ";
    }
}

A Problem from the Textbook

problem 14 of the textbook asks us to write a function similar to this:

vector<string> split(const string& s, char delim)

for instance, split("1,2,3,4", ',') return the vector {"1", "2", "3", "4"}

there are many ways to solve this problem

here is one solution that aims to be clear and simple, at the expense of some efficiency (i.e. the statement data += c is probably not the fastest way to construct a string)

vector<string> split(const string& s, char delim) {
    vector<string> result;
    string data;
    for(char c : s) {
        if (c == delim) {
            result.push_back(data);
            data = "";
        } else {
            data += c;
        }
    }
    result.push_back(data);
    return result;
}

int main() {
    for(;;) {
        cout << "--> ";
        string s;
        getline(cin, s);
        vector<string> result = split(s, ',');
        for(string s : result) cout << '"' << s << "\" ";
        cout << "\n";
    }
} // main

Joining a Vector of Strings

the join function is a sort of inverse of the split function:

string join(const vector<string>& v, const string& delim)

for example, join({"1", "2", "3", "4"}, ", ") returns the string "1, 2, 3, 4"

join({"cat"}, ", ") returns just the string "cat" (without any delimiters)

join({}, ", ") returns the empty string ""

string join(const vector<string>& v, const string& delim) {
    if (v.empty()) {
        return "";
    } else if (v.size() == 1) {
        return v[0];
    } else {
        string result = v[0];
        for(int i = 1; i < v.size(); ++i) {  // i starts at 1
            result += delim + v[i];
        }
        return result;
    }
}

Improved Vector Printing

join lets us improve the vector<string> print function we saw earlier

void print(const vector<string>& v) {
    string result = join(v, ", ");
    cout << "{" << result << "}";
}

or this

void print(const vector<string>& v) {
    cout << "{" << join(v, ", ") << "}";
}

we can use it like this

for(;;) {
    cout << "--> ";
    string s;
    getline(cin, s);
    vector<string> result = split(s, ',');
    cout << "result = ";
    print(result);
    cout << "\n";
}

Improved Vector Printing

an even more convenient way to print a vector is to overload the << operator

ostream& operator<<(ostream& os, const vector<string>& v)
{
  os << "{" << join(v, ", ") << "}";
  return os;
}

now we can print a vector like this

for(;;) {
    cout << "--> ";
    string s;
    getline(cin, s);
    vector<string> result = split(s, ',');
    cout << "result = " << result << "\n";
}

Improved Vector Printing

this operator<< function is an example of operator overloading

that is, it is an example of how you can, in C++, define an operator (that already exists) to do something else

it’s often a convenient feature, but it’s not essential to this course and so we won’t go into the details of how it works (and you don’t need to memorize how it works)

in fact, many other programming languages don’t allow operator overloading; some programmers believe it makes programs harder to read

Sample Program: Fridge Magnet Poetry

// poetry.cpp

#include "cmpt_error.h"
#include <iostream>
#include <vector>
#include <string>
#include <cstdlib>

using namespace std;

vector<string> a_the = {"a", "the", "every", "some", "one", "that", "this",
                        "his", "her"};

vector<string> noun1 = {"dog", "cat", "pumpkin", "baby", "block of cheese",
                        "carrot", "tuba", "pencil", "eraser", "bowling ball",
                        "penguin", "bathtub"};

vector<string> noun2 = {"coconut", "C++ compiler", "church", "stapler", "box",
                        "punctuation mark", "key", "giraffe", "blade of grass",
                        "eel", "shoelace"};

vector<string> verb = {"ate", "loved", "exploded", "ran", "jumped", "pushed",
                       "scared", "cried", "opened", "crushed", "threw", "painted",
                       "swallowed", "sniffed"};

vector<string> mod1 = {"big", "huge", "tiny", "flaming", "wet", "crunchy",
                       "uncontrollably screaming", "bendy", "broken", "suspicious", "smelly", "hopping"};

// Pre-condition:
//    v is not empty
// Post-condition:
//    returns a randomly chosen string from v
string rand_choice(const vector<string>& v) {
    if (v.empty()) {
        cmpt::error("v can't be empty");
    }
    // 0 <= rand() <= RAND_MAX
    int r = rand() % v.size();
    return v[r];
}

// Pre-condition:
//    none
// Post-condition:
//    returns either s + " ", or the empty string (50% chance each)
string s_or_empty(const string& s) {
    if (rand() % 2 == 0) {
        return s + " ";
    } else {
        return "";
    }
}

string rand_line1() {
    string result =    rand_choice(a_the) + " "
                     + s_or_empty(rand_choice(mod1))
                     + rand_choice(noun1) + " "
                     + rand_choice(verb) + " "
                     + rand_choice(a_the) + " "
                     + s_or_empty(rand_choice(mod1))
                     + rand_choice(noun2) + " ";
    return result;
}

void print_poem() {
    cout << rand_line1() << "\n"
         << rand_line1() << "\n"
         << rand_line1() << "\n";
}

int main() {
    srand(time(NULL));

    for(int i = 0; i < 10; ++i) {
        print_poem();
        cout << "\n";
    }
}