Introduction to vectors ======================= In C++, ``vector`` is a standard container for holding a sequence of values (of the same type). On the surface C++ ``vector``\ s look and behave like ordinary C-style arrays, but ``vectors``\ s are much more sophisticated. Here's how we might use a vector to store temperature data:: vector temps(5); // make a vector of 5 doubles temps[0] = -2; // temperatures are in Celsius temps[1] = 2; temps[2] = 5; temps[3] = 7; temps[4] = 5; for(int i = 0; i < temps.size(); ++i) { // print temps cout << temps[i] << "\n"; } This code fragment declares a ``vector`` of ``double``\ s called ``temps``. It is initialized to hold exactly 5 ``double``\ s. Square-bracket notation is used to access individual elements of a ``vector``, e.g. ``temps[0]`` is the first element of the vector, ``temps[1]`` is the second, and so on. The number inside the square-brackets is called an **index**, the first index for a C++ ``vector`` is *always* 0. It's often useful to draw diagrams representing vectors:: 0 1 2 3 4 +-----+-----+-----+-----+-----+ temps | -2 | 2 | 5 | 7 | 5 | +-----+-----+-----+-----+-----+ The numbers in the boxes are the values stored in the vector, while the numbers on to of the boxes are the index values. Essentially, an index is a name for the value in the corresponding box. It's important understand index numbering for ``vector``\ s because it is a common source of bugs. As mentioned, the first index of a ``vector`` is *always* 0. That means if you your ``vector`` has n elements, then the index of the last element is n - 1 --- **not** n! So in our example, the last element of ``temps`` is ``temps[4]``. There is no ``temps[5]`` in our array. The confusion on this point is so common that it even has a name: *off-by-one error*. Most programmers can tell you stories about getting stuck on off-by-one errors as they can be surprisingly hard to find. Unfortunately, the C++ ``vector`` does *not* check if you go outside the index bounds of an array. For example, this code compiles and runs without complaint:: cout << temps[-1] << "\n" // temps[-1] doesn't exist << temps[5] << "\n"; // temps[5] doesn't exist This prints two unknown values. -1 and 5 are not index values for ``temps``, and so junk values are printed. It's up to the programmer to make sure that the index values are never wrong. Appending Items to a vector --------------------------- C++ ``vector``\ s are dynamic: they're size can change. Probably the most common way to change a vector's size is to add new items to the end. For example:: vector pets; // initially empty, i.e. pets.size() == 0 pets.push_back("Fluffy"); pets.push_back("Fido"); pets.push_back("Newton"); pets.push_back("Glubglub"); // print the contents of pets for(int i = 0; i < pets.size(); ++i) { cout << pets[i] << "\n"; } The statement ``pets.push_back("Fluffy")`` appends the string ``"Fluffy"`` to the right end of the array ``pets``. It also increases the size of the vector by 1. .. note:: A C++ vector sometimes uses more memory than is strictly necessary to store all of the elements it contains. This extra memory helps functions like ``push_back`` be more efficient by not always needing to allocate new memory each time they're called. Thus, it is possible that for certain memory-sensitive applications a ``vector`` may use more memory than you would like, in which case some other data structure (such as an array) might be better. Sorting a vector ---------------- A common task is to arrange the elements of a ``vector`` into ascending sorted order. That turns out to be quite easy to do in C++ since it has a standard sorting function:: vector temps(5); // make a vector of 5 doubles temps[0] = -2; // temperatures are in Celsius temps[1] = 2; temps[2] = 5; temps[3] = 7; temps[4] = 5; sort(temps.begin(), temps.end()); for(int i = 0; i < temps.size(); ++i) { // print temps cout << temps[i] << "\n"; } The statement ``sort(temps.begin(), temps.end())`` efficiently re-arranges the elements of ``temps`` to be in order from smallest to largest. Here's another example. This program reads in strings from the user and then prints them in sorted order:: cout << "Please enter some words: "; vector words; string w; while (cin >> w) { words.push_back(w); } sort(words.begin(), words.end()); for(int i = 0; i < words.size(); ++i) { cout << words[i] << "\n"; } Notice that we don't worry about the size of the ``words`` vector: it automatically grows to the necessary size when ``push_back`` is called. In other words, a vector automatically manages its own memory (in contrast to, say, a C-style array where the programmer is responsible for managing memory). What if you want to sort a vector into descending order? The simplest approach is probably to use ``sort`` and then ``reverse``, e.g.:: sort(temps.begin(), temps.end()); reverse(temps.begin(), temps.end()); .. note:: Later in the course we'll see how to write our own sorting function. In practice, however, it is almost always better to use C++'s standard ``sort``, since it is both extremely efficient and an quite general-purpose. More vector examples -------------------- Here's how you can sum the elements of a vector:: vector v; // define and initialize v v.push_back(6); v.push_back(-1); v.push_back(4); v.push_back(5); double sum = 0.0; // sum elements of v for(int i = 0; i < v.size(); ++i) { sum += v[i]; } cout << sum; The for-loop that does the summing scans through ``v`` one element at a time. It adds each element to ``sum``, which keeps a running sum. It's important to remember to initialize ``sum`` to 0 at the start. This is a useful code idiom to remember:: for(int i = 0; i < v.size(); ++i) { // ... do something to v[i] ... } It is applicable to many different situations. For instance, suppose you want to print just the *negative* values of a vector. Then you could do this:: vector v; // define and initialize v v.push_back(6); v.push_back(-1); v.push_back(4); v.push_back(5); for(int i = 0; i < v.size(); ++i) { if (v[i] < 0) cout << v[i] << "\n"; } Or suppose you wanted to separate positive and negative values into their own vectors:: vector v; // define and initialize v v.push_back(6); v.push_back(-1); v.push_back(4); v.push_back(5); vector positive; vector negative; for(int i = 0; i < v.size(); ++i) { if (v[i] < 0) { negative.push_back(v[i]); } else { positive.push_back(v[i]); } } Passing vectors to Functions ---------------------------- Printing out a vector is a common task, and so it makes sense to create a function that does that for us. One not-so-good way to do that is like this:: void print(vector v) { // v is copied! for(int i = 0; i < v.size(); ++i) cout << v[i] << " "; } // ... vector x; // define and initialize x x.push_back(6); x.push_back(-1); x.push_back(4); x.push_back(5); print(x); While this works, it is unnecessarily inefficient: the vector that is passed to ``print`` is copied, which needlessly wastes time and memory. This kind of parameter-passing is known as **pass by value**. Pass by value always makes a copy of the thing being passed. On the plus side, this means the original object can't be modified in any way. But since ``print`` doesn't change the vector in any way (it just reads it), there's no need to make a copy of it. So instead of pass by value, C++ lets you use *pass by reference*:: void print(vector& v) { for(int i = 0; i < v.size(); ++i) cout << v[i] << " "; } vector x; // define and initialize x x.push_back(6); x.push_back(-1); x.push_back(4); x.push_back(5); print(x); The only change here is the addition of the ``&`` in the header. This means that when ``print`` is called, ``v`` refers to the very same data that ``x`` refers to. No copy is made, and so passing by reference in this case is much more efficient. One more refinement is useful. Since ``print`` only *reads* the values of ``v``, and never changes them, we can use **pass by constant reference**:: void print(const vector& v) { for(int i = 0; i < v.size(); ++i) cout << v[i] << " "; } Here we've added ``const`` in front of ``vector&``. It tells the compiler that we don't intend for any code inside ``print`` to change ``v``, and so there's no need to make a copy of the vector. Plus, the compiler can check that no code in ``print`` modifies ``v``. Thus it is both efficient and safe. As a rule of thumb, we will pass vectors by constant reference whenever possible. Of course, you can only use pass by constant reference for functions that don't change the vectors you pass to them. If you do need to change a vector in a function you must use regular pass by reference, e.g.:: void read_vector(vector& v) { double x; while (cin >> x) { v.push_back(x); } } If you included ``const`` in front of ``vector``, then you would get a compiler error indicating you aren't allowed to modify ``v``. Suppose you forgot the ``&`` and passed ``v`` by value:: void read_vector(vector v) { // v passed by value double x; while (cin >> x) { v.push_back(x); } } There is *no* compiler error or runtime error, but this does not work. The problem is that ``v`` is a *copy* of the passed-in vector, and it is this copy that we modify --- not the original vector. When ``read_vector`` ends, ``v`` is automatically deleted. Passing strings to Functions ---------------------------- Like vectors, we also have to carefully choose how we want to pass strings to functions. We can pass a string by value like this:: double get_double(string prompt) { // pass by value cout << prompt; double x; cin >> x; return x; } Here ``prompt`` is a copy of whatever string is passed to ``get_double``. If the string is long, then this could needlessly waste time and memory. Instead we could pass it by reference:: double get_double(string& prompt) { // pass by reference cout << prompt; double x; cin >> x; return x; } But this won't work in some cases. For example, this line of code causes a compiler error:: double x = get_double("x? "); The problem is that passing by reference allows for the possibility that the passed-in string might be modified (even if it is not actually modified). But the literal string ``"x? "`` is a constant value that cannot be modified, and so you get a compiler error saying the string can't be changed. So the best solution is to use pass by constant reference:: double get_double(const string& prompt) { // pass by constant reference cout << prompt; double x; cin >> x; return x; } Since ``get_double`` doesn't modify ``prompt`` in any way, the best approach is to use pass by constant reference. Including a default value, it would look like this:: double get_double(const string& prompt = "Please enter a double: ") { cout << prompt; double x; cin >> x; return x; } A Function that Swaps Value --------------------------- As a test of our understanding of the difference between pass-by-value and pass-by-reference, consider this code:: int x = 3; int y = 5; swap(x, y); cout << x << " " << y << endl; // prints 3 5 The ``swap`` function is supposed to exchange the values of ``x`` and ``y``; this turns out to be quite handy when writing sorting algorithms. Here's how you could write it in C++:: void swap(int& a, int& b) { // a and b passed by reference: correct! int temp = a; a = b; b = temp; } The vital detail here is that the parameters are passed by *reference*. That is, ``a`` is another name for the value ``x`` refers to, and so if you change the value of ``a`` you are changing the value of ``x``. Same for ``b`` and ``y``. Here's a *wrong* way to implement this function:: void swap(int a, int b) { // wrong: a and b passed by value int temp = a; a = b; b = temp; } The problem here is that ``a`` and ``b`` are copies of ``x`` and ``y``. If you modify ``a``, then this has no effect whatsoever on ``x``, because they are distinct variables.