Chapter 8 Notes¶
Please read chapter 8 of the textbook.
C-Style Strings¶
C++ is built on top of the C language, and almost all features of C are part of C++
C was designed to be a low-level language for implementing things like operating systems
C provides simple, low-level features that give the programmer great control over a program at the cost of being harder to use correctly
one of the goals of C++ is to add some easier-to-use features on top of C
strings are one such feature added by C++
C++ strings are a high-level, easy-to-use feature that for almost all programs are easier to use than C-style strings
C++ still supports C-style strings, because C++ supports pretty much all of C
and in some cases, C++ cannot avoid C-style strings, and so you will have to know a bit about them
but, when we use strings in C++, we mean the C++ string
data type, and
not low-level C string
C-style strings are little more that sequences of bytes that that end with a
\0
, and so we will see them when we discuss arrays
C++ strings are much more sophisticated (but still extremely efficient), and make use of many high-level C++ features
The Standard string Class¶
we’ve been using the C++ standard string
class throughout this course
it’s pretty easy to use!
string
is a data type, and a value of type string is called an object
we’ll discuss details of how a string
object can be implemented in the
next chapter
for now, lets understand how to use strings — they are one of the most useful and powerful data types in C++
#include <string>
int main() {
string s = "The Jackal got caught.";
cout << s << "\n";
}
Initializing a string¶
#include "cmpt_error.h"
#include <iostream>
#include <string>
using namespace std;
int main() {
string a; // a is the empty string, ""
string b = "house";
string c("chair");
string d{"mouse"}; // C++11-style initialization
cout << "a = \"" << a << "\"\n"
<< "b = \"" << b << "\"\n"
<< "c = \"" << c << "\"\n"
<< "d = \"" << d << "\"\n";
}
"house"
is an example of a string literal
string literals in C++ are not of type string
, unfortunately
string literals are instead an array-of-characters
this is to be backwards-compatible with C (C has no string
type)
much of the time this is not an issue as C++ tries hard to correctly handle
mixing C-style strings and string
objects
notice how many ways you can initialize a string
you can use =
or ()
notation
or {}
notation
{}
notation is the newest notation, and it was added because it turns out
the other two notations don’t work in all possible cases (!)
Initializing a string¶
another way to initialize a string is by making a copy of an existing string
string a = "apple";
string b = a;
string c(a);
string d{a}; // new in C++11
a
, b
, c
, and d
all have the value "apple"
they each have their own personal copies
you can also create a string that is a sequence of 0 or more copies of the same character
string bar(10, '-');
cout << bar << "\n"; // ----------
// 10 '-'' characters
Strings Know their Size¶
the size()
member function tells you the number of characters in a
string
string s = "Once upon a time ...\n";
cout << "s has " << s.size() << " characters\n";
// s has 21 characters
be careful counting escape characters, such as \n
(newline) or \t
(tab)
they are single characters even though they must be typed using two symbols
string s = "\n\n\t\\n";
cout << "s has " << s.size() << " characters\n";
// s has 5 characters
Concatenating Strings¶
use +
or +=
to combine — concatenate — two or more strings
int main() {
cout << "What is your first name? ";
string name;
cin >> name;
string msg = "* Hi " + name + "! *\n";
string bar(msg.size() - 1, '*'); // note the -1
bar += '\n';
cout << bar
<< msg
<< bar;
}
What is your first name? Vetruvious
******************
* Hi Vetruvious! *
******************
the code that draws this box is worth looking at in detail
Reading an Entire Line¶
we’ve seen throughout this course how to read a single word from cin
, e.g.
string s;
cin >> s; // s gets assigned the first word the user types
cin >> s
skips whitespace
so if the user types “apple tree”
then s
is assigned the string “apple”
but sometimes you want to read in the entire line, spaces and all
you can do that like this
cout << "Please enter a string: ";
string s;
getline(cin, s); // read everything the user types up to \n
cout << "s = \"" << s << "\"\n";
by default, getline
uses '\n'
as the end-of-input marker
but you can change that to be other characters if you like, e.g.
cout << "Please enter a string: ";
string s;
getline(cin, s, '?'); // read everything the user types up to ?
cout << "s = \"" << s << "\"\n";
Accessing the Characters of a string¶
[]-notation is used to access string characters
if s
is a string containing n
characters, then
s[0]
is the first character of s
s[1]
is the second character of s
s[2]
is the third character of s
...
s[n - 2]
is the second to last character of s
s[n - 1]
is the last character of s
Accessing the Characters of a string¶
cout << "Please enter a string: ";
string s;
cin >> s;
for(int i = 0; i < s.size(); ++i) {
cout << "s[" << i << "] = '" << s[i] << "'\n";
}
here’s a sample run
Please enter a string: butter
s[0] = 'b'
s[1] = 'u'
s[2] = 't'
s[3] = 't'
s[4] = 'e'
s[5] = 'r'
Accessing the Characters of a string¶
the following for-loop shows a basic loop structure for processing every character of a string
for(int i = 0; i < s.size(); ++i) {
cout << s[i];
}
i
starts at 0 because the s[0]
is the first character of s
the last character of s
is s[s.size() - 1]
thus i < s.size()
is the correct condition
it’s important to know that s[s.size()]
is not the last character of
s
since the numbering of the characters in s
starts at 0 (instead of 1),
s[s.size() - 1]
is the last character of s
this is a subtle point that often confuses new programmers
Accessing the Characters of a string¶
you can modify characters in a C++ string
for example, this code replaces all spaces in a string with '_'
(underscore)
cout << "Please enter a string: ";
string s;
getline(cin, s);
for(int i = 0; i < s.size(); ++i) {
if (s[i] == ' ') {
s[i] = '_';
}
}
cout << "s = \"" << s << "\"\n";
Square-bracket Notation¶
the []-notation is used with strings, vectors, and arrays (and thus C-style strings)
we’ll see it again in different situations
in general, if an object x
is a sequence of n values, then x[i]
is the
item at index location i
of x
in C++ (and C), x[0]
is the first item of the sequence, and x[n - 1]
is that last element
it’s also quite possible in C++ to create your own sequence-like objects that define their own []-bracket notation (we won’t get to that in this course)
Range Errors¶
a common mistake is to access characters in a string outside the legal range of index values, e.g.
string s = "cat";
cout << s[-1] // -1 isn't a valid index; program keeps running (!)
<< s[3]; // 3 isn't a valid index; program keeps running (!)
unfortunately, C and C++ don’t catch these errors either at compile-time or run-time
they just return unknown values, or perhaps cause undefined behaviour
it’s also a common source of security problems in programs
many C programs, for instance, are infamous for suffering from buffer overflows that boil down to the fact that strings (and arrays) don’t care want index value you pass to them
Range Errors¶
it would not be hard for C or C++ to check that every string (or array) access is in-range
but that has a run-time cost that the designers of C and C++ felt was too high (keep in mind that C++ was created in a time when computers were generally much slower than they are today, and with much less memory)
if you do want range-checking, C++ strings (and vectors) let you use the
at(i)
member function instead of []-bracket notation
the at
function does range-checking, and will catch indexing errors at
run-time
string s = "cat";
cout << s.at(-1) // -1 isn't a valid index; at throws a run-time error
<< s.at(3); // 3 isn't a valid index; at throws a run-time error
Some Useful string Functions¶
suppose s
and t
are strings
s == t
tests if s
and t
are the same length have the same
characters in the same order
s != t
tests if s
and t
are different
s < t
tests if s
comes before t
lexicographically
s <= t
tests if s
comes before t
lexicographically, or is equal to
t
s > t
tests if s
comes after t
lexicographically
s >= t
tests if s
comes after t
lexicographically, or is equal to
t
the term lexicographically is a more general version of the term alphabetical
lexicographical order is the same as alphabetic order when you are dealing with alphabetic letters only
but the characters of a string might include digits, punctuation, or hundreds of other non-letter symbols
Getting A Substring¶
the substr
member function lets you extract a substring from a string
string s = "character";
string t = s.substr(3, 3);
cout << t; // "rac"
substr
has many useful applications, and so you should know it, and how to
use
see the table on page 483 for a few more functions that come with string
Writing Our Own string Functions¶
it’s instructive to write our own versions of string functions
this helps us understand how they can be implemented
and it gives us practice writing C++ code
here are some functions for string equality and inequality
bool equal(const string& s, const string& t) {
if (s.size() != t.size()) return false;
// at this point we know s and t are the same size
for(int i = 0; i < s.size(); ++i) {
if (s[i] != t[i]) {
return false;
}
}
return true;
}
bool not_equal(const string& s, const string& t) {
return !equal(s, t);
}
notice that s
and t
are passed by constant reference
since they are passed by reference, no copy of the string is made, which is makes these functions efficient
they are const
reference to tell the compile that these functions only
read s
and t
and don’t modify them; this is very useful in larger
programs, and it means equal
and not_equal
can be called by other
const functions
Testing if a String Contains a Character¶
you can use a loop to test if a string contains a given character
bool contains(const string& s, char c) {
for(int i = 0; i < s.size(); ++i) {
if (s[i] == c) {
return true;
}
}
return false;
}
the loop iterates over each character of s
notice that the statement return false
is outside of the loop — that’s
important!
Testing if a String Contains a Character¶
contains
returns either true
or false
thus it is a boolean function
here’s a related function that returns the index of a character in a string
// Pre-condition:
// none
// Post-condition:
// returns an int i such that s[i] == c;
// if c is not anywhere in s, then it returns -1
int index_of(const string& s, char c) {
for(int i = 0; i < s.size(); ++i) {
if (s[i] == c) {
return i;
}
}
return -1;
}
we use -1 to as the “char not found” return value
this is a common way to indicate a “not found” value for an index
this function can be used to write the contains
function
bool contains(const string& s, char c) {
return !(index_of(s, c) == -1);
}
Testing if a Character is a Vowel¶
the index_of
and contains
function has many uses
for example
bool is_vowel(char c) {
return contains("aeiouAEIOU", c);
}
int vowel_count(const string& s) {
int count = 0;
for(int i = 0; i < s.size(); ++i) {
if (is_vowel(s[i])) {
count++;
}
}
return count;
}
note that we could have written is_vowel
using a big or-expression like
this
if (c == 'a' || c == 'e' || c == 'i' ... )
but using contains
makes it shorter and easier to read
Reversing a String¶
void swap(char& x, char& y) {
char temp = x;
x = y;
y = temp;
}
void reverse(string& s) { // note that s is passed by reference
int a = 0;
int b = s.size() - 1;
while (a < b) {
swap(s[a], s[b]);
++a;
--b;
}
}
// a palindrome is a string that is the same backwards and forwards
// e.g. "pop" and "racecar"
bool is_palindrome(const string& s) {
string s_rev = s;
reverse(s_rev);
return s == s_rev;
}
Converting a string to an int¶
suppose you want to convert the string
“5832” to the int
5832
one way to do that is to use C++’s standard stoi
function (which is in the
C++ string
library), e.g.:
#include <string>
// ...
string s = "5832";
int i = stoi(s); // is 5832
lets write our own version of this function
// Pre-condition:
// s is a string of digits that, when interpreted as a base 10
// integer, can be represented as a non-negative int.
// Post-condition:
// Returns the int value of integer represented by s.
// Example:
// str_to_int("6789") returns the int 6789
int str_to_int(const string& s) {
int pow = 1;
int result = 0;
for(int i = s.size() - 1; i >= 0; --i) {
result += (s[i] - '0') * pow;
pow *= 10;
}
return result;
}
notice that the pre-condition puts a number of important restrictions on
s
the body of str_to_int
does not bother to check for errors because it
assumes the pre-condition is true
also note that the expression s[i] - '0'
converts the digit s[i]
into
its correspond int
value
e.g. '8' - '0'
evaluates to the int
8
Vectors¶
C++ vectors are sequential collections of 0 or more objects
#include "cmpt_error.h"
#include <iostream>
#include <vector>
using namespace std;
int main() {
// an empty vector of ints
vector<int> ages;
cout << ages.size() << "\n"; // 0
// a vector of 4 strings
vector<string> names = {"Bob", "Mary", "Zia", "Talia"};
cout << names.size() << "\n"; // 4
// a vector of 2 doubles
vector<double> temps = {5.4, -2.2};
cout << temps[1] << "\n"; // -2.2
// a vector of 4 int vectors
vector<vector<int>> table = {
{1},
{2, 3, 1},
{6, 8, 9, 1},
{7, 3}
};
cout << table.size() << "\n"; // 4
cout << table[2][3] << "\n"; // 1
} // main
Vectors¶
you can read and write individual elements of a vector using []-bracket notation
vector<int> v = {6, 2, 5};
cout << v[0] << "\n" // 6
<< v[1] << "\n" // 2
<< v[2] << "\n"; // 5
v[0] = v[1] + v[2];
cout << v[0] << "\n" // 7
<< v[1] << "\n" // 2
<< v[2] << "\n"; // 5
v[2] = 8;
v[1] = 0;
cout << v[0] << "\n" // 7
<< v[1] << "\n" // 0
<< v[2] << "\n"; // 8
for an n-element vector v
, v[0]
is the first element and v[n - 1]
is the last element
out-of-range index values, such as v[-1]
or v[n]
are errors, but C++
does not cause an error
it is up to the programmer to make sure they never access out-of-range values in a vector (or string, or array)
Vectors¶
you can add elements to a vector as they appear
the vector will automatically increase its size to hold the new element
#include "cmpt_error.h"
#include <iostream>
#include <vector>
using namespace std;
int main() {
cout << "Please enter some words: ";
string w;
vector<string> words;
while (cin >> w) {
words.push_back(w); // appends w to the right end of words
}
cout << "You entered:\n";
for(int i = 0; i < words.size(); ++i) {
cout << " " << words[i] << "\n";
}
} // main
the statement words.push_back(w)
adds the string w
to the right end of
words
words
automatically increases its size to hold w
For-Each Loops¶
C++11 provides a new kind of for-loop specifically designed for iterating over vectors (and strings, and arrays)
in many cases, it is the easiest way to loop through a vector
#include "cmpt_error.h"
#include <iostream>
#include <algorithm>
using namespace std;
int main() {
cout << "Please enter some words: ";
vector<string> words;
string w;
while (cin >> w) {
words.push_back(w);
}
sort(words.begin(), words.end());
for(string s : words) {
cout << s << "\n";
}
} // main
this loop is an example of a for-each loop:
for(string s : words) {
cout << s << "\n";
}
it avoids the need for an index variable
which simplifies the code
this variation is more efficient
for(const string& s : words) {
cout << s << "\n";
}
this makes s
refer to the corresponding string in words
in contrast to the original (using string s
) that made a copy of the
string
For-Each Loops¶
here’s an example that uses a for-each loop to iterate over the characters in a string
bool is_digit(char c) {
return c >= '0' && c <= '9';
}
// Pre-condition:
// none
// Post-condition:
// returns true if s consists entirely of digits
bool is_int(const string& s) {
for(char c : s) {
if (!is_digit(c)) {
return false;
}
}
return true;
}
For-Each Loops¶
this function prints a vector of strings
void print(const vector<string>& v) {
for(const string& s : v) {
cout << s << " ";
}
}
A Problem from the Textbook¶
problem 14 of the textbook asks us to write a function similar to this:
vector<string> split(const string& s, char delim)
for instance, split("1,2,3,4", ',')
return the vector {"1", "2", "3",
"4"}
there are many ways to solve this problem
here is one solution that aims to be clear and simple, at the expense of some
efficiency (i.e. the statement data += c
is probably not the fastest way
to construct a string)
vector<string> split(const string& s, char delim) {
vector<string> result;
string data;
for(char c : s) {
if (c == delim) {
result.push_back(data);
data = "";
} else {
data += c;
}
}
result.push_back(data);
return result;
}
int main() {
for(;;) {
cout << "--> ";
string s;
getline(cin, s);
vector<string> result = split(s, ',');
for(string s : result) cout << '"' << s << "\" ";
cout << "\n";
}
} // main
Joining a Vector of Strings¶
the join
function is a sort of inverse of the split
function:
string join(const vector<string>& v, const string& delim)
for example, join({"1", "2", "3", "4"}, ", ")
returns the string "1, 2, 3, 4"
join({"cat"}, ", ")
returns just the string "cat"
(without any delimiters)
join({}, ", ")
returns the empty string ""
string join(const vector<string>& v, const string& delim) {
if (v.empty()) {
return "";
} else if (v.size() == 1) {
return v[0];
} else {
string result = v[0];
for(int i = 1; i < v.size(); ++i) { // i starts at 1
result += delim + v[i];
}
return result;
}
}
Improved Vector Printing¶
join
lets us improve the vector<string>
print
function we saw
earlier
void print(const vector<string>& v) {
string result = join(v, ", ");
cout << "{" << result << "}";
}
or this
void print(const vector<string>& v) {
cout << "{" << join(v, ", ") << "}";
}
we can use it like this
for(;;) {
cout << "--> ";
string s;
getline(cin, s);
vector<string> result = split(s, ',');
cout << "result = ";
print(result);
cout << "\n";
}
Improved Vector Printing¶
an even more convenient way to print a vector is to overload the <<
operator
ostream& operator<<(ostream& os, const vector<string>& v)
{
os << "{" << join(v, ", ") << "}";
return os;
}
now we can print a vector like this
for(;;) {
cout << "--> ";
string s;
getline(cin, s);
vector<string> result = split(s, ',');
cout << "result = " << result << "\n";
}
Improved Vector Printing¶
this operator<<
function is an example of operator overloading
that is, it is an example of how you can, in C++, define an operator (that already exists) to do something else
it’s often a convenient feature, but it’s not essential to this course and so we won’t go into the details of how it works (and you don’t need to memorize how it works)
in fact, many other programming languages don’t allow operator overloading; some programmers believe it makes programs harder to read
Sample Program: Fridge Magnet Poetry¶
// poetry.cpp
#include "cmpt_error.h"
#include <iostream>
#include <vector>
#include <string>
#include <cstdlib>
using namespace std;
vector<string> a_the = {"a", "the", "every", "some", "one", "that", "this",
"his", "her"};
vector<string> noun1 = {"dog", "cat", "pumpkin", "baby", "block of cheese",
"carrot", "tuba", "pencil", "eraser", "bowling ball",
"penguin", "bathtub"};
vector<string> noun2 = {"coconut", "C++ compiler", "church", "stapler", "box",
"punctuation mark", "key", "giraffe", "blade of grass",
"eel", "shoelace"};
vector<string> verb = {"ate", "loved", "exploded", "ran", "jumped", "pushed",
"scared", "cried", "opened", "crushed", "threw", "painted",
"swallowed", "sniffed"};
vector<string> mod1 = {"big", "huge", "tiny", "flaming", "wet", "crunchy",
"uncontrollably screaming", "bendy", "broken", "suspicious", "smelly", "hopping"};
// Pre-condition:
// v is not empty
// Post-condition:
// returns a randomly chosen string from v
string rand_choice(const vector<string>& v) {
if (v.empty()) {
cmpt::error("v can't be empty");
}
// 0 <= rand() <= RAND_MAX
int r = rand() % v.size();
return v[r];
}
// Pre-condition:
// none
// Post-condition:
// returns either s + " ", or the empty string (50% chance each)
string s_or_empty(const string& s) {
if (rand() % 2 == 0) {
return s + " ";
} else {
return "";
}
}
string rand_line1() {
string result = rand_choice(a_the) + " "
+ s_or_empty(rand_choice(mod1))
+ rand_choice(noun1) + " "
+ rand_choice(verb) + " "
+ rand_choice(a_the) + " "
+ s_or_empty(rand_choice(mod1))
+ rand_choice(noun2) + " ";
return result;
}
void print_poem() {
cout << rand_line1() << "\n"
<< rand_line1() << "\n"
<< rand_line1() << "\n";
}
int main() {
srand(time(NULL));
for(int i = 0; i < 10; ++i) {
print_poem();
cout << "\n";
}
}