.. highlight:: c++ Chapter 9 Notes =============== Please read chapter 9 of the textbook. Warm-up: Hexadecimal -------------------- hexadecimal bit strings are a popular way of encoding long sequences of bits hexadecimal (or just hex for short) is essentially a base-16 encoding of numbers there are 16 "digits" in hexadecimal, 0-9 and then a, b, c, d, e, f each hexadecimal digit corresponds to a pattern of 4 bits ======= ======= ====== Base 10 Base 16 Binary ======= ======= ====== 0 0 0000 1 1 0001 2 2 0010 3 3 0011 4 4 0100 5 5 0101 6 6 0110 7 7 0111 8 8 1000 9 9 1001 10 a 1010 11 b 1011 12 c 1100 13 d 1101 14 e 1110 15 f 1111 ======= ======= ====== you can encode one byte (i.e. 8-bits) using two hex digits, e.g.:: 0100 0011 4 3 so the bit string 01000011 can be written as the hex string 43 any 32-bit string can be written using 8 hex digits any 64-bit string can be written using 16 hex digits C++ has good support for hexadecimal strings for example, here is a program that :: // hex.cpp #include #include #include #include using namespace std; int main() { // a table of the binary values for the 16 hex digits, e.g. // hexbits[11] is "1011", the bit string for hex digit b vector hexbits = { "0000", "0001", "0010", "0011", "0100", "0101", "0110", "0111", "1000", "1001", "1010", "1011", "1100", "1101", "1110", "1111" }; // // print a table of decimal, hexadecimal, and binary // cout << "Dec Hex Bin\n"; cout << "----------------\n"; for(int i = 0; i < 16; ++i) { cout << std::dec << std::setw(2) << i << " "; cout << std::hex << std::setw(5) << i << " "; cout << std::setw(7) << hexbits[i] << "\n"; } } note the ``cout`` statements in the for-loop do some fancy formatting ``std::dec`` means "print the next number in decimal format" ``std::hex`` means "print the next number in hexadecimal format" ``std::setw(2)`` means "print the next thing right-justified in a 2-space width" ``std::setw(5)`` means "print the next thing right-justified in a 5-space width" Pointers -------- extensive use of pointers is one of the defining features of C (and C++) in C/C++, pointers represent memory addresses for example :: int n = 3; we can imagine the computers memory as something like this :: 5482 5483 5484 ----- ----- ----- | ? | 3 | ? | ----- ----- ----- n when the code fragment above runs the variable ``n`` is assigned to the memory location 5483 the number 5483 is just made up C++ and the operating system determines the exact memory location of ``n`` on each run of the program Pointers -------- we can get the address of ``n`` using ``&``, the address-of operator :: int n = 3; cout << &n << "\n"; here are three different runs of the above code on my computer:: $ ./pointers 0xbfe4d8e8 $ ./pointers 0xbfbf2068 $ ./pointers 0xbfa73268 the output is in hexadecimal, i.e. base-16 the ``0x`` at the start indicates this is a hexadecimal number notice that for each run of the program ``n`` is located at a different place in memory generally, you can't predict where in memory a variable will be stored Pointer Variables ----------------- a pointer stores an address of a variable for example :: int n = 3; int* n_addr = &n; cout << n_addr << "\n"; the variable ``n_addr`` has the type ``int*`` ``int*`` is the type "``int`` pointer" that means that ``n_addr`` can store the address of an ``int`` in this example, ``n_addr`` stores the address of ``n`` Pointer Variables ----------------- to see what's going on, look at this picture of memory after this statement runs :: int n = 3; memory looks like this :: 5482 5483 5484 ----- ----- ----- | ? | 3 | ? | ----- ----- ----- n then after this statement runs :: int* n_addr = &n; :: 5482 5483 5484 ----- ----- ----- | ? | 3 | ? | ----- ----- ----- n 3939 ------ | 5483 | ------ n_addr the variable ``n_addr`` is stored at (made up) memory location 3939 ``n_addr`` holds 5483, i.e. the address of where in memory ``n`` is stored we usually say ``n_addr`` **points** to ``n`` it's often useful to draw arrow diagrams showing pointers as arrows Dereferencing a Pointer ----------------------- pointers let us access the memory they refer to :: int n = 3; int* n_addr = &n; cout << *n_addr // note the * << "\n"; ``n_addr`` points to an ``int`` ``*n_addr`` is the value that ``n_addr`` points to i.e. 3 in this case Dereferencing a Pointer ----------------------- here's another example :: string s = "apple"; string* p = &s; string* q = p; cout << p << ", " << *p << "\n" << q << ", " << *q << "\n"; ``p`` and ``q`` are string pointers they both point to the same address, i.e. the location where ``s`` is stored you can access the string there via ``s``, or ``*p``, or ``*q`` careful! they all share the same underlying string (no copies of the characters are made) :: string s = "apple"; string* p = &s; string* q = p; cout << p << ": " << *p << "\n" << q << ": " << *q << "\n"; s = "orange"; cout << p << ": " << *p << "\n" << q << ": " << *q << "\n"; s[0] = 'g'; cout << p << ": " << *p << "\n" << q << ": " << *q << "\n"; (*p)[5] = 'y'; cout << p << ": " << *p << "\n" << q << ": " << *q << "\n"; Passing Pointers to Functions ----------------------------- you can pass pointers to functions it acts like passing by reference :: void set_zero(double* x) { *x = 0.0; } // ... int main() { double a = 54; set_zero(&a); // note use of & } :: void get_double(double* x, const string& prompt) { cout << prompt; cin >> *x; } // ... int main() { double t = 0.0; get_double(&t, "Please enter a number: "); // note use of & } Passing Pointers to Functions ----------------------------- :: // swapping using pointers void swap(int* a, int* b) { int temp = *a; *a = *b; *b = temp; } you use this version of ``swap`` like this :: int x = 2; int y = 3; cout << x << " " << y << "\n"; // 2 3 swap(&x, &y); // note the use of & cout << x << " " << y << "\n"; // 3 2 the programmer must remember to write the ampersands when calling this ``swap``, i.e. ``&x`` and ``&y`` also, the code inside ``swap`` must use ``*`` throughout to dereference the pointers this is messier and more error-prone than the pass-by-reference version C++ lets you write:: // how you could write swap in C void swap(int& a, int& b) { int temp = a; a = b; b = temp; } especially in C, pointers are used frequently in functions, both as parameters and return values (although be careful with return values --- see the discussion of the *free store* below) Defining a Pointer ------------------ here are two common syntax styles for defining a pointer :: int n = 5; int* a = &n; // * attached to int int *b = &n; // * attached to b the compiler doesn't care where you put the ``*`` when ``a`` are ``b`` defined it's up the programmer some programmers like the ``int* a`` way, because it emphasizes that ``int*`` is an "int pointer" data type some programmers like the ``int *b`` way, because it makes it clear that ``*b`` is of type ``int`` be careful if you declare more than one variable at a time, e.g. :: double *c, d; // c is double*, d is double ``c`` is of type ``double*`` but ``d`` is of type ``double`` if you want both to be points you'd have to write it like this :: double *c, *d; now both ``c`` and ``d`` are of type ``double*`` a good rule of thumb is to avoid declaring multiple variables like this instead, explicitly declare each variable on its own line, thus avoiding this problem entirely or, if you like, you can use ``typedef`` to avoid pointer declaration errors :: typedef int* IntPtr; int n = 6; IntPtr p, q; // both p and q are type IntPtr p = &n; q = &n; cout << *p // 6 << *q // 6 << "\n"; Null Pointers ------------- if you declare a pointer variable without giving it an initial value, then, in general, you can't be sure what value it has :: double* p; // p has some unknown value cout << *p; // oops: you don't know where p points, // so you have no idea what this will print if you need a pointer variable, but don't have a value for it to point to yet, use ``nullptr`` :: double* q = nullptr; // q points to no value cout << q << "\n"; ``nullptr`` is a new C++11 keyword, and in C and older C++ versions you would write ``0`` instead of ``nullptr`` but don't use ``0`` (or ``NULL``) for the null pointer in this course! always use ``nullptr`` note that it's always an error to de-reference a null pointer :: double* q = nullptr; // q points to no value cout << *q // oops: can't dereference a null pointer << "\n"; on my computer, executing ``*q`` causes the program to crash with a ``Segmentation fault`` error C++ Memory Organization ----------------------- before we move on to the next topic, lets discuss for a moment how C++ organizes a programs memory basically, all C++ programs are divided into three memory regions **static memory**: global variables (variables declared outside of any function) are stored here **stack memory**: variables declared inside functions (local variables) are stored here, along with function parameters and return values; stack memory is *automatically* managed by C++, growing and shrinking as needed, i.e. when a function ends all the local variables in that function are automatically de- allocated **free store (heap) memory**: this is memory that can be manually allocated and de-allocated by the programmer when needed; it's big advantage over stack memory is that it allows for memory to persist between function calls the great thing about stack memory is that it works automatically without any special effort from the programmer but is not flexible enough for many applications so we also need free store memory, which is manually allocated and de- allocated by the user Free Store Memory ----------------- in C++, we will use ``new`` to allocate memory on the free store and ``delete`` or ``delete[]`` (for arrays) to de-allocate memory **warning**: don't use ``malloc`` and ``free`` functions for free store management in this course! those functions are used for memory management in C **warning**: mixing ``malloc``/``free`` and ``new``/``delete`` in the same program can corrupt the free store! stick to ``new``/``delete`` in this course! :: // allocate an int, initialized to 3, on the free store int* p = new int(3); cout << *p << "\n"; // de-allocate the memory when we're done with it delete p; the expression ``new int(3)`` allocated a new ``int`` on the free store, initialized to 3 it then returns the address of that ``int`` so ``p`` points to an ``int`` on the free store ``*p`` is the ``int`` stored there (in this case, 3) you can use ``*p`` just like a regular ``int`` but, and this is surprisingly hard to do correctly in general, you must remember to de-allocate the memory ``p`` points to (using the statement ``delete p;``) when you are done with it Memory Leaks ------------ consider this code :: void test() { int* p = new int(5); int n = 2; p = &n; cout << *p // 2 << "\n"; } this compiles, runs, and prints 2 so it might seem to work but it has a subtle bug known as a **memory leak** when the ``test`` function ends, the variable ``p`` is automatically deleted but the free store memory ``p`` points to is *not* deleted the memory is still there and, because ``p`` is gone, it can no longer be accessed in particular, the memory ``p`` pointed to cannot be de-allocated so every time you call the ``test()`` function, one ``int`` is allocated on the free store and then left there for the rest of the running time of the program if you call ``test()`` a lot of times, this could eat up a lot of memory and cause your program's performance to degrade Memory Leaks ------------ memory leaks are very common programming errors in practice, they can be extremely difficult to find and fix this is in part because it is not always clear when a free store memory can be safely deleted also, most programs can run at least for a little while suffering from a memory leak serious problems might not occur until a lot of memory is wasted, which for some programs, might take a while (or even not occur at all) many (most?) other languages provide automatic memory managers called **garbage collectors** that can automatically delete unused free store memory garbage collected languages are nice because then the programmer never needs to worry about when to delete unused memory --- the language takes care of it automatically but you pay a performance price for automated garbage collection, and for some applications that price might be too high Arrays and Pointers ------------------- arrays and pointers are closely related in C/C++ in this code, ``a`` is an array of 5 ``int`` values (with unknown initial values) ``a`` is local to the ``test()`` function, so it will be automatically de- allocated when ``test()`` ends :: void test() { int a[5]; for(int i = 0; i < 5; ++i) { a[i] = i; } // we can also treat a is if it were of type int*, and so access // elements of a like this cout << *a << "\n" << *(a + 1) << "\n" << *(a + 2) << "\n" << *(a + 3) << "\n" << *(a + 4) << "\n"; cout << "---\n"; // p is an int pointer that points to the first element of a // we can use it to access individual elements of a int* p = a; cout << *p << "\n" << *(p + 1) << "\n" << *(p + 2) << "\n" << *(p + 3) << "\n" << *(p + 4) << "\n"; } the point here is that ``a`` is essentially a pointer to an ``int``, and so we can use it like a pointer Arrays and Pointers ------------------- we can also write code like this :: int a[5]; for(int i = 0; i < 5; ++i) { a[i] = 0; } for(int* p = a; p < a + 5; ++p) { cout << "*p = " << *p << "\n"; } the second loop iterates over the array ``a`` using the pointer ``p`` instead of an index variable we could have written the first loop that initializes ``a`` like this :: for(int* p = a; p < a + 5; ++p) { *p = 0; } this style of iteration is actually quite common in C++, and is used throughout the STL (the C++ standard template library) Dynamic Arrays -------------- the arrays we've seen so far have all been allocated locally thus, they are automatically de-allocated when the function they are defined within ends but we can also allocate arrays on the free store for when we want an array to persist beyond a single function :: double* temp = new double[6]; for(int i = 0; i < 6; ++i) { temp[i] = i; } for(double* p = temp; p < temp + 6; ++p) { cout << "*p = " << *p << "\n"; } delete[] temp; // de-allocate the array temp points to the expression ``new double[6]`` create an array of 6 ``double`` values on the free store (with unknown initial values) it returns a value of type ``double*`` that points to the first element of this newly created array thus, ``temp`` points to the first value of this newly created array we can access individual elements of ``temp`` using the []-bracket notation, or pointer notation (as shown in the loops above) when we are done with the array, it is up to us, the programmer to de-allocate the memory used by the array by calling ``delete[] temp`` this gives the memory used by the array back to the freestore so that can be allocated again in future calls to ``new`` because ``temp`` points to an array (and not a single ``double``), we must use ``delete[]``, i.e. we need to ``[]`` after delete if we called just ``delete temp`` that would be an error and may corrupt the freestore if we don't call ``delete[]`` on the array, then we have a memory leak --- the array sits there taking up memory for the rest of the lifetime of the program Dynamic Arrays -------------- this function has a serious bug :: void bad_get_median() { cout << "How many numbers do you want to enter? "; int n = 0; cin >> n; if (n <= 0) cmpt::error("n must be positive"); cout << "Please enter " << n << " numbers: "; double* nums = new double[n]; for(int i = 0; i < n; ++i) { cin >> nums[i]; } // re-arrange the numbers to be in ascending sorted order sort(nums, nums + n); cout << "median = " << nums[n / 2] << "\n"; } note that ``sort`` is from the standard C++ ``algorithm`` library the input to ``sort`` is a pointer to the first element, and a pointer to **one past** the last element we want sorted the problem with this code is that it has a memory leak this line creates a new array on the freestore :: double* nums = new double[n]; but the array is not deleted anywhere, i.e. nowhere is ``delete[]`` called the variable ``nums`` is local to ``bad_get_median()``, and so it disappears when ``bad_get_median()`` finishes but the array that ``nums`` pointed to stays untouched on the freestore we no longer have a pointer to it, so there's no way to delete it once ``bad_get_median`` ends so every time we call ``bad_get_median()``, a little bit of memory is forever lost to the program call it enough times, the program might run out of memory and thus crash Dynamic Arrays -------------- we can fix the memory leak by adding ``delete[] nums`` to the end :: void fixed_get_median() { cout << "How many numbers do you want to enter? "; int n = 0; cin >> n; if (n <= 0) cmpt::error("n must be positive"); cout << "Please enter " << n << " numbers: "; double* nums = new double[n]; for(int i = 0; i < n; ++i) { cin >> nums[i]; } // re-arrange the numbers to be in ascending sorted order sort(nums, nums + n); cout << "median = " << nums[n / 2] << "\n"; delete[] nums; } Dynamic Arrays -------------- another issue with dynamic memory is when you have two pointers to the same array :: int* arr = new int[3]; arr[0] = 0; arr[1] = 0; arr[2] = 0; int* p = arr; int* q = arr; ``p``, ``q``, and ``arr`` all point to the same underlying array a tricky issue that arises here is how do you de-allocate the array when you are done using it? you must call ``delete``, and you could call ``delete arr``, ``delete p``, or ``delete q`` the problem is that in C++ it's an error to call ``delete`` on memory that has already been deleted so this is wrong :: delete p; delete q; // oops: the array q points to has already been deleted in general, it can be *very* hard to know how many pointers are pointing to an array whenever you call ``delete`` you need to be certain that no other pointers point to it in modern C++, this (and related) problems are often best solved by using smart pointers", such as `shared_ptr `_, which automatically delete memory and guarantee that double-deletion is impossible