.. highlight:: c++ Chapter 8 Notes =============== Please read chapter 8 of the textbook. C-Style Strings --------------- C++ is built on top of the C language, and almost all features of C are part of C++ C was designed to be a low-level language for implementing things like operating systems C provides simple, low-level features that give the programmer great control over a program at the cost of being harder to use correctly one of the goals of C++ is to add some easier-to-use features on top of C strings are one such feature added by C++ C++ strings are a high-level, easy-to-use feature that for almost all programs are easier to use than C-style strings C++ still supports C-style strings, because C++ supports pretty much all of C and in some cases, C++ cannot avoid C-style strings, and so you will have to know a bit about them but, when we use strings in C++, we mean the C++ ``string`` data type, and **not** low-level C string C-style strings are little more that sequences of bytes that that end with a ``\0``, and so we will see them when we discuss arrays C++ strings are much more sophisticated (but still extremely efficient), and make use of many high-level C++ features The Standard string Class ------------------------- we've been using the C++ standard ``string`` class throughout this course it's pretty easy to use! ``string`` is a data type, and a value of type string is called an **object** we'll discuss details of how a ``string`` object can be implemented in the next chapter for now, lets understand how to use strings --- they are one of the most useful and powerful data types in C++ :: #include int main() { string s = "The Jackal got caught."; cout << s << "\n"; } Initializing a string --------------------- :: #include "cmpt_error.h" #include #include using namespace std; int main() { string a; // a is the empty string, "" string b = "house"; string c("chair"); string d{"mouse"}; // C++11-style initialization cout << "a = \"" << a << "\"\n" << "b = \"" << b << "\"\n" << "c = \"" << c << "\"\n" << "d = \"" << d << "\"\n"; } ``"house"`` is an example of a *string literal* string literals in C++ are **not** of type ``string``, unfortunately string literals are instead an array-of-characters this is to be backwards-compatible with C (C has no ``string`` type) much of the time this is not an issue as C++ tries hard to correctly handle mixing C-style strings and ``string`` objects notice how many ways you can initialize a ``string`` you can use ``=`` or ``()`` notation or ``{}`` notation ``{}`` notation is the newest notation, and it was added because it turns out the other two notations don't work in all possible cases (!) Initializing a string --------------------- another way to initialize a string is by making a copy of an existing string :: string a = "apple"; string b = a; string c(a); string d{a}; // new in C++11 ``a``, ``b``, ``c``, and ``d`` all have the value ``"apple"`` they each have their own personal copies you can also create a string that is a sequence of 0 or more copies of the same character :: string bar(10, '-'); cout << bar << "\n"; // ---------- // 10 '-'' characters Strings Know their Size ----------------------- the ``size()`` member function tells you the number of characters in a ``string`` :: string s = "Once upon a time ...\n"; cout << "s has " << s.size() << " characters\n"; // s has 21 characters be careful counting escape characters, such as ``\n`` (newline) or ``\t`` (tab) they are single characters even though they must be typed using two symbols :: string s = "\n\n\t\\n"; cout << "s has " << s.size() << " characters\n"; // s has 5 characters Concatenating Strings --------------------- use ``+`` or ``+=`` to combine --- *concatenate* --- two or more strings :: int main() { cout << "What is your first name? "; string name; cin >> name; string msg = "* Hi " + name + "! *\n"; string bar(msg.size() - 1, '*'); // note the -1 bar += '\n'; cout << bar << msg << bar; } :: What is your first name? Vetruvious ****************** * Hi Vetruvious! * ****************** the code that draws this box is worth looking at in detail Reading an Entire Line ---------------------- we've seen throughout this course how to read a single word from ``cin``, e.g. :: string s; cin >> s; // s gets assigned the first word the user types ``cin >> s`` skips whitespace so if the user types "apple tree" then ``s`` is assigned the string "apple" but sometimes you want to read in the entire line, spaces and all you can do that like this :: cout << "Please enter a string: "; string s; getline(cin, s); // read everything the user types up to \n cout << "s = \"" << s << "\"\n"; by default, ``getline`` uses ``'\n'`` as the end-of-input marker but you can change that to be other characters if you like, e.g. :: cout << "Please enter a string: "; string s; getline(cin, s, '?'); // read everything the user types up to ? cout << "s = \"" << s << "\"\n"; Accessing the Characters of a string ------------------------------------ []-notation is used to access string characters if ``s`` is a string containing ``n`` characters, then ``s[0]`` is the first character of ``s`` ``s[1]`` is the second character of ``s`` ``s[2]`` is the third character of ``s`` ... ``s[n - 2]`` is the second to last character of ``s`` ``s[n - 1]`` is the last character of ``s`` Accessing the Characters of a string ------------------------------------ :: cout << "Please enter a string: "; string s; cin >> s; for(int i = 0; i < s.size(); ++i) { cout << "s[" << i << "] = '" << s[i] << "'\n"; } here's a sample run :: Please enter a string: butter s[0] = 'b' s[1] = 'u' s[2] = 't' s[3] = 't' s[4] = 'e' s[5] = 'r' Accessing the Characters of a string ------------------------------------ the following for-loop shows a basic loop structure for processing every character of a string :: for(int i = 0; i < s.size(); ++i) { cout << s[i]; } ``i`` starts at 0 because the ``s[0]`` is the first character of ``s`` the last character of ``s`` is ``s[s.size() - 1]`` thus ``i < s.size()`` is the correct condition it's important to know that ``s[s.size()]`` is **not** the last character of ``s`` since the numbering of the characters in ``s`` starts at 0 (instead of 1), ``s[s.size() - 1]`` is the last character of ``s`` this is a subtle point that often confuses new programmers Accessing the Characters of a string ------------------------------------ you can modify characters in a C++ string for example, this code replaces all spaces in a string with ``'_'`` (underscore) :: cout << "Please enter a string: "; string s; getline(cin, s); for(int i = 0; i < s.size(); ++i) { if (s[i] == ' ') { s[i] = '_'; } } cout << "s = \"" << s << "\"\n"; Square-bracket Notation ----------------------- the []-notation is used with strings, vectors, and arrays (and thus C-style strings) we'll see it again in different situations in general, if an object ``x`` is a sequence of n values, then ``x[i]`` is the item at index location ``i`` of ``x`` in C++ (and C), ``x[0]`` is the first item of the sequence, and ``x[n - 1]`` is that last element it's also quite possible in C++ to create your own sequence-like objects that define their own []-bracket notation (we won't get to that in this course) Range Errors ------------ a common mistake is to access characters in a string outside the legal range of index values, e.g. :: string s = "cat"; cout << s[-1] // -1 isn't a valid index; program keeps running (!) << s[3]; // 3 isn't a valid index; program keeps running (!) unfortunately, C and C++ **don't** catch these errors either at compile-time or run-time they just return unknown values, or perhaps cause undefined behaviour it's also a common source of security problems in programs many C programs, for instance, are infamous for suffering from **buffer overflows** that boil down to the fact that strings (and arrays) don't care want index value you pass to them Range Errors ------------ it would not be hard for C or C++ to check that every string (or array) access is in-range but that has a run-time cost that the designers of C and C++ felt was too high (keep in mind that C++ was created in a time when computers were generally much slower than they are today, and with much less memory) if you do want range-checking, C++ strings (and vectors) let you use the ``at(i)`` member function instead of []-bracket notation the ``at`` function does range-checking, and will catch indexing errors at run-time :: string s = "cat"; cout << s.at(-1) // -1 isn't a valid index; at throws a run-time error << s.at(3); // 3 isn't a valid index; at throws a run-time error Some Useful string Functions ---------------------------- suppose ``s`` and ``t`` are strings ``s == t`` tests if ``s`` and ``t`` are the same length have the same characters in the same order ``s != t`` tests if ``s`` and ``t`` are different ``s < t`` tests if ``s`` comes before ``t`` lexicographically ``s <= t`` tests if ``s`` comes before ``t`` lexicographically, or is equal to ``t`` ``s > t`` tests if ``s`` comes after ``t`` lexicographically ``s >= t`` tests if ``s`` comes after ``t`` lexicographically, or is equal to ``t`` the term **lexicographically** is a more general version of the term **alphabetical** lexicographical order is the same as alphabetic order when you are dealing with alphabetic letters only but the characters of a string might include digits, punctuation, or hundreds of other non-letter symbols Getting A Substring ------------------- the ``substr`` member function lets you extract a substring from a string :: string s = "character"; string t = s.substr(3, 3); cout << t; // "rac" ``substr`` has many useful applications, and so you should know it, and how to use see the table on page 483 for a few more functions that come with string Writing Our Own string Functions -------------------------------- it's instructive to write our own versions of string functions this helps us understand how they can be implemented and it gives us practice writing C++ code here are some functions for string equality and inequality :: bool equal(const string& s, const string& t) { if (s.size() != t.size()) return false; // at this point we know s and t are the same size for(int i = 0; i < s.size(); ++i) { if (s[i] != t[i]) { return false; } } return true; } bool not_equal(const string& s, const string& t) { return !equal(s, t); } notice that ``s`` and ``t`` are passed by constant reference since they are passed by reference, **no** copy of the string is made, which is makes these functions efficient they are ``const`` reference to tell the compile that these functions only read ``s`` and ``t`` and don't modify them; this is very useful in larger programs, and it means ``equal`` and ``not_equal`` can be called by other const functions Testing if a String Contains a Character ---------------------------------------- you can use a loop to test if a string contains a given character :: bool contains(const string& s, char c) { for(int i = 0; i < s.size(); ++i) { if (s[i] == c) { return true; } } return false; } the loop iterates over each character of ``s`` notice that the statement ``return false`` is outside of the loop --- that's important! Testing if a String Contains a Character ---------------------------------------- ``contains`` returns either ``true`` or ``false`` thus it is a boolean function here's a related function that returns the index of a character in a string :: // Pre-condition: // none // Post-condition: // returns an int i such that s[i] == c; // if c is not anywhere in s, then it returns -1 int index_of(const string& s, char c) { for(int i = 0; i < s.size(); ++i) { if (s[i] == c) { return i; } } return -1; } we use -1 to as the "char not found" return value this is a common way to indicate a "not found" value for an index this function can be used to write the ``contains`` function :: bool contains(const string& s, char c) { return !(index_of(s, c) == -1); } Testing if a Character is a Vowel --------------------------------- the ``index_of`` and ``contains`` function has many uses for example :: bool is_vowel(char c) { return contains("aeiouAEIOU", c); } int vowel_count(const string& s) { int count = 0; for(int i = 0; i < s.size(); ++i) { if (is_vowel(s[i])) { count++; } } return count; } note that we could have written ``is_vowel`` using a big or-expression like this :: if (c == 'a' || c == 'e' || c == 'i' ... ) but using ``contains`` makes it shorter and easier to read Reversing a String ------------------ :: void swap(char& x, char& y) { char temp = x; x = y; y = temp; } void reverse(string& s) { // note that s is passed by reference int a = 0; int b = s.size() - 1; while (a < b) { swap(s[a], s[b]); ++a; --b; } } // a palindrome is a string that is the same backwards and forwards // e.g. "pop" and "racecar" bool is_palindrome(const string& s) { string s_rev = s; reverse(s_rev); return s == s_rev; } Converting a string to an int ----------------------------- suppose you want to convert the ``string`` "5832" to the ``int`` 5832 one way to do that is to use C++'s standard ``stoi`` function (which is in the C++ ``string`` library), e.g.:: #include // ... string s = "5832"; int i = stoi(s); // is 5832 lets write our own version of this function :: // Pre-condition: // s is a string of digits that, when interpreted as a base 10 // integer, can be represented as a non-negative int. // Post-condition: // Returns the int value of integer represented by s. // Example: // str_to_int("6789") returns the int 6789 int str_to_int(const string& s) { int pow = 1; int result = 0; for(int i = s.size() - 1; i >= 0; --i) { result += (s[i] - '0') * pow; pow *= 10; } return result; } notice that the pre-condition puts a number of important restrictions on ``s`` the body of ``str_to_int`` does not bother to check for errors because it assumes the pre-condition is true also note that the expression ``s[i] - '0'`` converts the digit ``s[i]`` into its correspond ``int`` value e.g. ``'8' - '0'`` evaluates to the ``int`` 8 Vectors ------- C++ vectors are sequential collections of 0 or more objects :: #include "cmpt_error.h" #include #include using namespace std; int main() { // an empty vector of ints vector ages; cout << ages.size() << "\n"; // 0 // a vector of 4 strings vector names = {"Bob", "Mary", "Zia", "Talia"}; cout << names.size() << "\n"; // 4 // a vector of 2 doubles vector temps = {5.4, -2.2}; cout << temps[1] << "\n"; // -2.2 // a vector of 4 int vectors vector> table = { {1}, {2, 3, 1}, {6, 8, 9, 1}, {7, 3} }; cout << table.size() << "\n"; // 4 cout << table[2][3] << "\n"; // 1 } // main Vectors ------- you can read and write individual elements of a vector using []-bracket notation :: vector v = {6, 2, 5}; cout << v[0] << "\n" // 6 << v[1] << "\n" // 2 << v[2] << "\n"; // 5 v[0] = v[1] + v[2]; cout << v[0] << "\n" // 7 << v[1] << "\n" // 2 << v[2] << "\n"; // 5 v[2] = 8; v[1] = 0; cout << v[0] << "\n" // 7 << v[1] << "\n" // 0 << v[2] << "\n"; // 8 for an n-element vector ``v``, ``v[0]`` is the first element and ``v[n - 1]`` is the last element out-of-range index values, such as ``v[-1]`` or ``v[n]`` are errors, but C++ does not cause an error it is up to the programmer to make sure they never access out-of-range values in a vector (or string, or array) Vectors ------- you can add elements to a vector as they appear the vector will automatically increase its size to hold the new element :: #include "cmpt_error.h" #include #include using namespace std; int main() { cout << "Please enter some words: "; string w; vector words; while (cin >> w) { words.push_back(w); // appends w to the right end of words } cout << "You entered:\n"; for(int i = 0; i < words.size(); ++i) { cout << " " << words[i] << "\n"; } } // main the statement ``words.push_back(w)`` adds the string ``w`` to the right end of ``words`` ``words`` automatically increases its size to hold ``w`` For-Each Loops -------------- C++11 provides a new kind of for-loop specifically designed for iterating over vectors (and strings, and arrays) in many cases, it is the easiest way to loop through a vector :: #include "cmpt_error.h" #include #include using namespace std; int main() { cout << "Please enter some words: "; vector words; string w; while (cin >> w) { words.push_back(w); } sort(words.begin(), words.end()); for(string s : words) { cout << s << "\n"; } } // main this loop is an example of a for-each loop:: for(string s : words) { cout << s << "\n"; } it avoids the need for an index variable which simplifies the code this variation is more efficient :: for(const string& s : words) { cout << s << "\n"; } this makes ``s`` refer to the corresponding string in ``words`` in contrast to the original (using ``string s``) that made a copy of the string For-Each Loops -------------- here's an example that uses a for-each loop to iterate over the characters in a string :: bool is_digit(char c) { return c >= '0' && c <= '9'; } // Pre-condition: // none // Post-condition: // returns true if s consists entirely of digits bool is_int(const string& s) { for(char c : s) { if (!is_digit(c)) { return false; } } return true; } For-Each Loops -------------- this function prints a vector of strings :: void print(const vector& v) { for(const string& s : v) { cout << s << " "; } } A Problem from the Textbook --------------------------- problem 14 of the textbook asks us to write a function similar to this:: vector split(const string& s, char delim) for instance, ``split("1,2,3,4", ',')`` return the vector ``{"1", "2", "3", "4"}`` there are many ways to solve this problem here is one solution that aims to be clear and simple, at the expense of some efficiency (i.e. the statement ``data += c`` is probably not the fastest way to construct a string) :: vector split(const string& s, char delim) { vector result; string data; for(char c : s) { if (c == delim) { result.push_back(data); data = ""; } else { data += c; } } result.push_back(data); return result; } int main() { for(;;) { cout << "--> "; string s; getline(cin, s); vector result = split(s, ','); for(string s : result) cout << '"' << s << "\" "; cout << "\n"; } } // main Joining a Vector of Strings --------------------------- the ``join`` function is a sort of inverse of the ``split`` function:: string join(const vector& v, const string& delim) for example, ``join({"1", "2", "3", "4"}, ", ")`` returns the string ``"1, 2, 3, 4"`` ``join({"cat"}, ", ")`` returns just the string ``"cat"`` (without any delimiters) ``join({}, ", ")`` returns the empty string ``""`` :: string join(const vector& v, const string& delim) { if (v.empty()) { return ""; } else if (v.size() == 1) { return v[0]; } else { string result = v[0]; for(int i = 1; i < v.size(); ++i) { // i starts at 1 result += delim + v[i]; } return result; } } Improved Vector Printing ------------------------ ``join`` lets us improve the ``vector`` ``print`` function we saw earlier :: void print(const vector& v) { string result = join(v, ", "); cout << "{" << result << "}"; } or this :: void print(const vector& v) { cout << "{" << join(v, ", ") << "}"; } we can use it like this :: for(;;) { cout << "--> "; string s; getline(cin, s); vector result = split(s, ','); cout << "result = "; print(result); cout << "\n"; } Improved Vector Printing ------------------------ an even more convenient way to print a vector is to overload the ``<<`` operator :: ostream& operator<<(ostream& os, const vector& v) { os << "{" << join(v, ", ") << "}"; return os; } now we can print a vector like this :: for(;;) { cout << "--> "; string s; getline(cin, s); vector result = split(s, ','); cout << "result = " << result << "\n"; } Improved Vector Printing ------------------------ this ``operator<<`` function is an example of **operator overloading** that is, it is an example of how you can, in C++, define an operator (that already exists) to do something else it's often a convenient feature, but it's not essential to this course and so we won't go into the details of how it works (and you don't need to memorize how it works) in fact, many other programming languages don't allow operator overloading; some programmers believe it makes programs harder to read Sample Program: Fridge Magnet Poetry ------------------------------------ :: // poetry.cpp #include "cmpt_error.h" #include #include #include #include using namespace std; vector a_the = {"a", "the", "every", "some", "one", "that", "this", "his", "her"}; vector noun1 = {"dog", "cat", "pumpkin", "baby", "block of cheese", "carrot", "tuba", "pencil", "eraser", "bowling ball", "penguin", "bathtub"}; vector noun2 = {"coconut", "C++ compiler", "church", "stapler", "box", "punctuation mark", "key", "giraffe", "blade of grass", "eel", "shoelace"}; vector verb = {"ate", "loved", "exploded", "ran", "jumped", "pushed", "scared", "cried", "opened", "crushed", "threw", "painted", "swallowed", "sniffed"}; vector mod1 = {"big", "huge", "tiny", "flaming", "wet", "crunchy", "uncontrollably screaming", "bendy", "broken", "suspicious", "smelly", "hopping"}; // Pre-condition: // v is not empty // Post-condition: // returns a randomly chosen string from v string rand_choice(const vector& v) { if (v.empty()) { cmpt::error("v can't be empty"); } // 0 <= rand() <= RAND_MAX int r = rand() % v.size(); return v[r]; } // Pre-condition: // none // Post-condition: // returns either s + " ", or the empty string (50% chance each) string s_or_empty(const string& s) { if (rand() % 2 == 0) { return s + " "; } else { return ""; } } string rand_line1() { string result = rand_choice(a_the) + " " + s_or_empty(rand_choice(mod1)) + rand_choice(noun1) + " " + rand_choice(verb) + " " + rand_choice(a_the) + " " + s_or_empty(rand_choice(mod1)) + rand_choice(noun2) + " "; return result; } void print_poem() { cout << rand_line1() << "\n" << rand_line1() << "\n" << rand_line1() << "\n"; } int main() { srand(time(NULL)); for(int i = 0; i < 10; ++i) { print_poem(); cout << "\n"; } }