Introduction to Object-oriented Programming (OOP) ================================================= **Object-oriented programming**, usually abbreviated **OOP**, is a popular style of programming that most modern languages support. OOP is particularly useful for creating code libraries, and for organizing large programs. C++ was one of the languages that helped make OOP popular in practice. In C++, OOP is essentially a way of creating **user-defined types**. OOP has many technical details that you should eventually learn, although for this introduction we will focus on a few main concepts. The Idea of OOP --------------- The basic idea of OOP is to organize programs using **objects**. An object can be almost anything, such as: - a string - a vector - a stream - a date - a document - a letter in a document - a record in a data base - a web page - a tree (in a file system, say) - a circle - a particle in an animated explosion - etc. For example, C++ ``string``\ s are objects. Like all objects in C++, a ``string`` is made up of two kinds of things: 1. The **data** for representing the string. Usually --- but not always --- this data is stored as an array of ``char`` values (i.e. a C-style string). 2. Common **functions** that operate on that data. For instance, if ``s`` is s ``string``, then you can call these functions on it: - ``s.size()`` returns the number of characters in ``s`` - ``s.empty()`` returns true if the ``s`` is the empty string, and false otherwise - ``s.substr(i, n)`` returns a new string of size ``n`` consisting of the ``n`` characters ``s[i]``, ``s[i + 1]``, ..., ``s[i + n - 1]`` An object is a collection of data, plus functions that operate on that data. Well-designed objects are often easy to use because they contain everything you need in easy-to-access place. Creating Your Own Objects ------------------------- Suppose we want to create our own object for representing two-dimensional (x, y) points (such objects turn out to have many practical uses in graphics). In C++, like many OOP languages, before you can create an object you need to create a *class* that describes the parts of the object. A class is like a factory that creates objects. .. note:: Not all OOP languages use classes to create objects. Some, such as JavaScript, instead create objects by *copying* existing objects. That is, you create an initial object by hand, and then copy it and change its values to make new objects. This style of object creation is know as *prototyping*, and it can be more flexible than class-based object creation. However, we will only be discussing class-based object creation in this course. Here is the class for our point objects:: struct Point { double x; double y; }; // Point int main() { Point p; p.x = 4; p.y = 2; cout << p.x << " " << p.y << endl; } .. note:: Even though we write this code using ``struct``, we will generally refer to it as a *class* as a reminder that it is used for creating objects. We will see a bit later in these notes that the keyword ``class`` can be used instead of ``struct``, although with a few technical differences. Here, ``Point`` is a ``struct`` that describes the contents of ``Point`` objects. Creating a ``Point`` object is easy:: Point p; Point t; To access the variables in ``p`` and ``t`` we use dot notation:: p.x = 4; p.y = 2; t.x = 0; t.y = 0; cout << p.x << ' ' << p.y << endl // prints: 4 2 << t.x << ' ' << t.y << endl; // prints: 0 0 ``p.x``, ``p.y``, ``t.x``, and ``t.y`` are all variables of type ``double``, and so you can use them anywhere you could use a double variable. Notice that ``p`` and ``t`` get their own personal copies of ``x`` and ``y``. The variable ``p.x`` is different than the variable ``t.x``. Adding Functions ---------------- Now lets add some printing functions to our ``Point`` class:: struct Point { double x; double y; void print() { cout << "(" << x << ", " << y << ")"; } void println() { print(); cout << endl; } }; // Point int main() { Point p; p.x = 4; p.y = 2; p.println(); } Look at the ``print`` function. It's body is allowed to use ``x`` and ``y`` directly *without* any dot-notation. Similarly, note that in the ``println`` function we call ``print``. This shows that any function defined inside a class can access any of the variables or functions also defined within it. .. note:: Sometimes functions defined inside a class are called *methods*. However, we will usually just refer to them as functions. Constructors ------------ One problem with ``Point`` objects is that they don't initialize their ``x`` and ``y`` values to anything sensible:: Point p; p.println(); // prints unknown values It's up to the programmer to remember to give them initial values, e.g.:: Point p; p.x = 0; p.y = 0; The problem with this is that it is inconvenient to have to write two assignment statements every time you create a point. It is to forget, or to do incorrectly. A better approach is to use *constructors* to initialize our ``Point``\ s. A constructor is a special function designed specifically for intializing objects. For example:: struct Point { double x; double y; Point() : x(0), y(0) { // default constructor // empty body } // ... }; // Point In C++, a constructor always has the same name as the class it resides in. The constructor we've added here does not take any input, and so is used like this:: Point p; p.println(); // prints (0, 0) When ``p`` is created, the default constructor for ``Point`` is automatically called, which sets the values of ``x`` and ``y`` to 0. This is extremely useful: now there is no way to create a ``Point`` without initializing its variables. Errors due to random initialization values are no longer an issue. Constructors look like functions, but have some important differences you need to know: - Constructors do *not* have a return type, not even ``void``. - The name of a constructor is always the name of the class. - After the constructor's input parameter list comes an *initializer list*:: Point() // default constructor : x(0), y(0) // initializer list { // empty body } The purpose of the initializer list is to assign values to the variables of the object *before* any other code is executed. You can also put whatever code you like inside the code block if any further initialization is required. Classes often have more than one constructor. Lets add a second constructor to ``Point`` that lets the programmer provide ``x`` and ``y`` values:: struct Point { double x; double y; Point() : x(0), y(0) { // default constructor // empty body } Point(int a, int b) : x(a), y(b) { // constructor // empty body } // ... }; // Point Now we can write code like this:: Point p(4, 2); p.println(); // prints (4, 2) Notice the notation for how ``p`` is initialized. We pass in 4 and 2, and they get assigned to ``p.x`` and ``p.y``. The other constructor, the default constructor that takes no parameters:: Point origin; origin.println(); // prints (0, 0) Of course, it is not strictly necessary any more since we could write this:: Point origin(0, 0); origin.println(); // prints (0, 0) Another useful type of constructor is a *copy constructor*. As the name suggests, a copy constructor is used to make a copy of another object:: struct Point { double x; double y; Point() : x(0), y(0) { // default constructor // empty body } Point(int a, int b) : x(a), y(b) { // constructor // empty body } Point(const Point& p) : x(p.x), y(p.y) { // copy constructor // empty body } // ... }; // Point The copy constructor lets us write code like this:: Point start(4, 2); Point home(start); // make a copy of start start.println(); // prints (4, 2) home.println(); // prints (4, 2) Keep in mind that ``start`` and ``home`` are separate objects with their own personal ``x`` and ``y`` variables. Testing for Equality -------------------- Another useful function to add to ``Point`` is ``equals``, which tests if two ``Point`` objects are the same:: struct Point { double x; double y; // ... bool equals(const Point& p) { return p.x == x && p.y == y; } // ... }; // Point Recall that ``&&`` means "and", and so the expression ``p.x == x && p.y == y`` returns ``true`` if both ``p.x == x`` is true, and ``p.y == y`` is true. If one, or both, are false, then the entire expression is false. Now you can write code like this:: Point p; // (0, 0) Point target; // (0, 0) cin >> target.x >> target.y; // read in target's value if (p.equals(target)) { cout << "same"; } else { cout << "different"; } But there's a problem with ``equals``: ``double`` arithmetic is *not exact*, i.e. calculations done with ``double``\ s suffer from small but unavoidable round-off errors. That means that you might have two ``double``\ s that are, for all practical purposes, equal, but are not exactly the same according to ``==``. For example, in pretty much any practical program we would like to treat ``0.0`` and ``0.000000000000001`` as being the same. To solve this problem, lets re-write ``equals``:: // If the absolute value of the difference of two doubles is the less // than min_diff, then they will be considered equal. const double min_diff = 0.00000000001; struct Point { // ... bool equals(const Point& p) { return abs(p.x - x) < min_diff && abs(p.y - y) < min_diff; } // ... }; // Point Here, two ``double``\ s, ``x`` and ``y``, are considered the same if the absolute value of their differences is less than the constant ``min_diff``. While it takes a little more time to do this equality check, it is more accurate and probably more useful in general. .. note:: Dealing with the round-off errors inherent in floating-point computer arithmetic turns out to be highly non-trivial in general. Long, complicated calculations can suffer from huge amounts of error if they are not done carefully. Numeric computation is an import sub-topic of computer science, although we won't go into it any further here. Using Points in Other Functions ------------------------------- We can pass ``Point`` objects to other functions similarly to how built-in data types are passed. For example:: // calculates the distance between p and q double dist(const Point& p, const Point& q) { double dx = p.x - q.x; double dy = p.y - q.y; return sqrt(dx * dx + dy * dy); } Note that this function is *not* inside the ``Point`` class. Destructors ----------- A destructor is a special kind of function that an object calls *automatically* when it is destroyed, i.e. when it goes out of scope, or is given back to the free store (using ``delete``). Our ``Point`` objects don't have any practical need for a destructor, but lets add one anyways to see how they work:: struct Point { // ... ~Point() { cout << "(Point destructor called)\n"; } }; // Point C++ destructors start with the ``~`` symbol followed by the name of the class. Destructors *never* take any input parameters, and are usually used for "cleaning up" resources the object used. In this case, all we are doing is printing a message when the destructor is called. This can be quite useful for debugging: it tells you when an object no longer exists. It's often useful to think of constructors and destructors as working together to manage some computer resource: the constructor initializes the resource, and the destructor de-initializes it. Since both constructors and destructors are automatically called, the programmer will never forget to initialize/de- initialize the resource. For instance, suppose you've created an object for a printer:: struct Printer { // ... Printer() { printer.open(); } // ... ~Printer() { printer.close(); } }; Now ``Printer`` objects will automatically open and close the printer without the programmer needing to anything more than create a ``Printer`` object. Public and Private ------------------ Lets create a new class for representing a person:: struct Person { string name; int age; Person(const string& n, int a) : name(n), age(a) { if (age <= 0) error("illegal age"); } }; // struct Person What's interesting here is that the constructor checks to see if the age is positive. If not, it throws an error. While this stops us from *creating* a ``Person`` object with nonsensical age, it doesn't stop code from later setting ``age`` to be a bad value:: Person p("Harry Potter", 14); p.age = -5; // oops: age should never be negative! C++ provides a solution to this problem. Lets re-write the ``Person`` class like this:: struct Person { private: string name; int age; public: Person(const string& n, int a) : name(n), age(a) { if (age <= 0) error("illegal age"); } }; // struct Person Here we divided the class into two separate regions, one labeled ``private`` and the other labeled ``public``. Now ``name`` and ``age`` *cannot be accessed outside of ``Person``*. For example, this code now causes a compiler error:: Person p("Harry Potter", 14); p.age = -5; // compiler error: age is private The public part of ``Person`` contains everything we want code outside of ``Person`` to have access to. In this case, the only public thing is the constructor (otherwise how can you create the object?). By default, everything in a ``struct`` is ``public``. C++ also has a construct called ``class`` which is just like ``struct`` except by default everything is ``private``. For instance, we could re-write ``Person`` like this:: class Person { string name; int age; public: Person(const string& n, int a) : name(n), age(a) { if (age <= 0) error("illegal age"); } }; // class Person We've made two changes here: ``struct`` has been replaced with ``class``, and the ``private`` label has been removed. We don't need it because in a ``class`` variables and functions are ``private`` by default. Setters and Getters ------------------- While making ``age`` private stops us from ever giving it a nonsensical value, it is too strict: we have no way to make sensible changes to ``age``. Even worse, we have no way to read ``age``, i.e. this code causes a compiler error:: cout << p.age; // compiler error: age is private In OOP, the general solution to this problem is to use functions known as *getters* and *setters*. Roughly, getters return the value of a variable, and setters write the value of a variable. Since getters and setters are functions you can add whatever protection code you need or want. So lets add a setter and getter for ``age`` to ``Person``:: class Person { string name; int age; public: Person(const string& n, int a) : name(n), age(a) { if (age <= 0) error("illegal age"); } int get_age() { // getter return age; } int set_age(const int a) { // setter if (a <= 0) error("illegal age"); age = a; } }; // class Person It's essential that the setters and getters be in the ``public`` part of the class so that code outside the class can access them. Now to access the age of a ``Person`` we call ``get_age()``:: Person p("Harry Potter", 14); cout << p.get_age() << "\n"; If we try to set the age to a nonsensical value an error is thrown:: p.set_age(-5); // error thrown at runtime But sensible ages cause no error:: p.set_age(15); // ok To be complete, we should also add a setter and getter for the name:: class Person { string name; int age; public: Person(const string& n, int a) : name(n), age(a) { if (age <= 0) error("illegal age"); } int get_age() { // getter return age; } int set_age(const int a) { // setter if (a <= 0) error("illegal age"); age = a; } string get_name() { // getter return name; } void set_name(const string& n) { // setter if (n.empty()) error("illegal name"); name = n; } }; // class Person Now we can write code like this:: Person p("Harry Potter", 14); cout << p.get_name() << ", " << p.get_age() << "\n"; Setters and getters are important because they give the programmer complete control over how the variables of an object are accessed. Often objects have special variables that code outside the object either doesn't need to know about, or should not be able to change. In such a case the variable should be declared ``private``, and no setter/getter should be created for it. This general technique of keeping variables and functions hidden from the rest of a program is called **information hiding**, and experience shows that it is a very useful technique for creating large, complex programs. By hiding implementation details we not only reduce the mental burden on the programmer using the object, but we also make it hard for them to accidentally "mess it up" by assigning a variable a nonsensical value. Constant Objects ---------------- Recall that you can declare a variable to be ``const``, which means it is read-only. Suppose we do this with a ``Person``:: const Person baby("Emily", 1); // error! cout << baby.get_name() << ", " << baby.get_age() << "\n"; Unfortunately, when you compile this it doesn't work: you get a compiler error indicating that you are not allowed to call ``get_name()`` on ``baby`` because of the ``const``. In other words, C++ is saying that calling ``get_name()`` might change ``baby`` somehow. We know it won't, but C++ doesn't. So to get ``const`` to work with ``Person``, we need to indicate which of its functions can be called with a constant ``Person`` object:: class Person { string name; int age; public: Person(const string& n, int a) : name(n), age(a) { if (age <= 0) error("illegal age"); } int get_age() const { // const added return age; } int set_age(const int a) { if (a <= 0) error("illegal age"); age = a; } string get_name() const { // const added return name; } void set_name(const string& n) { if (n.empty()) error("illegal name"); name = n; } }; // class Person Only ``get_age()`` and ``get_name()`` are declared to be ``const`` because they are the only functions in ``Person`` that don't change one of its variables. After we mark ``get_age()`` and ``get_name()`` as ``const`` this code works as expected:: const Person baby("Emily", 1); cout << baby.get_name() << ", " << baby.get_age() << "\n"; However, this would (correctly) give a compiler error:: const Person baby("Emily", 1); baby.set_age(2); // compiler error! cout << baby.get_name() << ", " << baby.get_age() << "\n"; Operator Overloading -------------------- Lets return to our ``Point`` class:: // If the absolute value of the difference of two doubles is the less // than min_diff, then they will be considered equal. const double min_diff = 0.00000000001; struct Point { double x; double y; Point() : x(0), y(0) { // default constructor // empty body } Point(int a, int b) : x(a), y(b) { // constructor // empty body } Point(const Point& p) : x(p.x), y(p.y) { // copy constructor // empty body } bool equals(const Point& p) { return abs(p.x - x) < min_diff && abs(p.y - y) < min_diff; } void print() { cout << "(" << x << ", " << y << ")"; } void println() { print(); cout << endl; } ~Point() { cout << "(Point destructor called)\n"; } }; // Point While this is useful, it is a bit awkward. For instance, writing ``p.equals(q)`` is not as nice as the using the ``==`` operator the way we can with other C++ values. And using ``print`` and ``println`` is not as convenient as using the ``cout`` and ``<<``. To deal with this sort problem C++ lets you *overload* built-in operators to work with your own objects. For instance, here's how we make ``==`` and ``!==`` work with points:: struct Point { // ... bool operator==(const Point& p) { return equals(p); } bool operator!=(const Point& p) { return !equals(p); } // ... }; // Point This lets us write code like this:: Point start(4, 2); Point home(start); if (start == home) { cout << "They're the same!\n"; } else { cout << "They're different!\n"; } It makes the code a little easier to read, which is always a good thing in programming. Now lets do something about the print functions. To print a ``Point`` directly onto ``cout`` with ``<<``, we need to overload the ``<<`` operator:: struct Point { // ... }; // Point ostream& operator<<(ostream& out, const Point& p) { out << "(" << p.x << ", " << p.y << ")"; return out; } The ``<<`` operator is *not* a part of ``Point``, and so is defined outside of the ``struct``. Having a ``<<`` for ``Point`` is quite convenient:: Point start(4, 2); Point home(start); cout << start << endl << home << endl; Now printing ``Point`` objects works just like printing any other kind of object. Putting Point in its Own File ----------------------------- Points are quite useful in many different kinds of programs, and so it makes sense to do a little more work to make them easily re-usable. What we'll do here is put the ``Point`` class, and its related functions and variables, in a file called ``Point.h``. The ``.h`` indicates this is a *header* file, which, strictly speaking, should not contain implementation code but instead just header information. Here is the contents of ``Point.h``:: // Point.h // By defining point_cmpt125, we avoid problems caused by including // this file more than once: if point_cmpt125 is already defined, // then the code is *not* included. #ifndef point_cmpt125 #define point_cmpt125 201201L #include "std_lib_cmpt125.h" // If the absolute value of the difference of two doubles is the less // than min_diff, then they will be considered equal. const double min_diff = 0.00000000001; struct Point { double x; double y; Point() : x(0), y(0) { // default constructor // empty body } Point(int a, int b) : x(a), y(b) { // constructor // empty body } Point(const Point& p) : x(p.x), y(p.y) { // copy constructor // empty body } bool equals(const Point& p) { return abs(p.x - x) < min_diff && abs(p.y - y) < min_diff; } bool operator==(const Point& p) { return equals(p); } bool operator!=(const Point& p) { return !equals(p); } void print() { cout << "(" << x << ", " << y << ")"; } void println() { print(); cout << endl; } ~Point() { cout << "(Point destructor called)\n"; } }; // Point ostream& operator<<(ostream& out, const Point& p) { out << "(" << p.x << ", " << p.y << ")"; return out; } double dist(const Point& p, const Point& q) { double dx = p.x - q.x; double dy = p.y - q.y; return sqrt(dx * dx + dy * dy); } #endif To use it we do this:: #include "Point.h" int main() { Point p; // ... } ``#include`` is a pre-processor command that textually includes ``Point.h``, i.e. the ``#include`` statement gets replaced by the contents of the file. While ``#include`` is simple and straightforward to understand, it causes a problem if you try to include the same file more than once, e.g.:: #include "Point.h" // ... #include "Point.h" Now everything in ``Point.h`` is included two times, and so you get compile- time errors when you try to run the program. In large programs consisting of dozens of files, it is surprisingly easy to accidentally include the same file more than once. It is not always obvious if, and where, a file has been included. So to deal with this problem of multiple inclusion, ``Points.h`` uses the standard trick of defining a unique pre-processor symbol the first time the file is included:: #ifndef point_cmpt125 #define point_cmpt125 201201L // ... code for Point.h ... #endif ``#ifndef`` is a pre-processor command that checks to see if the pre-processor symbol ``point_cmpt125`` is undefined. If it is undefined, that means this is the first time the file has been included, and so ``point_cmpt125`` is immediately defined to be a long integer. Now if ``Point.h`` is ever included again, ``point_cmpt125`` will be defined, and the code between ``#ifndef`` and ``#endif`` will not be included. .. note:: Commands beginning with a ``#`` are pre-processor commands, and not C++ commands. The pre-processor is a program that is automatically run before compilation that modifies the source file texts in some ways. The pre-processor is necessary in C++ for things like ``#include``, and this ``#ifndef`` trick, and can also be used to do other sorts of things such as creating macros. However, over-use of the pre-processor is generally considered bad practice (most other modern languages dispense with it) in C++ because it is quite primitive: it manipulates the source code written by the programmer, and knows little about the rules of C++.