Introduction to Object-oriented Programming (OOP)
=================================================

**Object-oriented programming**, usually abbreviated **OOP**, is a popular
style of programming that most modern languages support. OOP is particularly
useful for creating code libraries, and for organizing large programs.

C++ was one of the languages that helped make OOP popular in practice. In C++,
OOP is essentially a way of creating **user-defined types**. OOP has many
technical details that you should eventually learn, although for this
introduction we will focus on a few main concepts.


The Idea of OOP
---------------

The basic idea of OOP is to organize programs using **objects**. An object can
be almost anything, such as:

- a string

- a vector

- a stream

- a date

- a document

- a letter in a document

- a record in a data base

- a web page

- a tree (in a file system, say)

- a circle

- a particle in an animated explosion

- etc.

For example, C++ ``string``\ s are objects. Like all objects in C++, a
``string`` is made up of two kinds of things:

1. The **data** for representing the string. Usually --- but not always
   --- this data is stored as an array of ``char`` values (i.e. a
   C-style string).

2. Common **functions** that operate on that data. For instance, if ``s``
   is s ``string``, then you can call these functions on it:

   - ``s.size()`` returns the number of characters in ``s``

   - ``s.empty()`` returns true if the ``s`` is the empty string, and
     false otherwise

   - ``s.substr(i, n)`` returns a new string of size ``n`` consisting
     of the ``n`` characters ``s[i]``, ``s[i + 1]``, ..., 
     ``s[i + n - 1]``

An object is a collection of data, plus functions that operate on that data.
Well-designed objects are often easy to use because they contain everything
you need in easy-to-access place.


Creating Your Own Objects
-------------------------

Suppose we want to create our own object for representing two-dimensional (x,
y) points (such objects turn out to have many practical uses in graphics).

In C++, like many OOP languages, before you can create an object you need to
create a *class* that describes the parts of the object. A class is like a
factory that creates objects.

.. note:: Not all OOP languages use classes to create objects. Some,
   such as JavaScript, instead create objects by *copying* existing
   objects. That is, you create an initial object by hand, and then
   copy it and change its values to make new objects. This style of
   object creation is know as *prototyping*, and it can be more
   flexible than class-based object creation. However, we will only be
   discussing class-based object creation in this course.

Here is the class for our point objects::

   struct Point {
     double x;
     double y;
   }; // Point
   
   int main() {
     Point p;
     p.x = 4;
     p.y = 2;
     cout << p.x << " " << p.y << endl;
   }

.. note:: Even though we write this code using ``struct``, we will
   generally refer to it as a *class* as a reminder that it is used
   for creating objects. We will see a bit later in these notes that
   the keyword ``class`` can be used instead of ``struct``, although
   with a few technical differences.

Here, ``Point`` is a ``struct`` that describes the contents of ``Point``
objects. Creating a ``Point`` object is easy::

   Point p;
   Point t;

To access the variables in ``p`` and ``t`` we use dot notation::

   p.x = 4;
   p.y = 2;

   t.x = 0;
   t.y = 0;

   cout << p.x << ' ' << p.y << endl  // prints: 4 2
        << t.x << ' ' << t.y << endl; // prints: 0 0

``p.x``, ``p.y``, ``t.x``, and ``t.y`` are all variables of type ``double``,
and so you can use them anywhere you could use a double variable.

Notice that ``p`` and ``t`` get their own personal copies of ``x`` and ``y``.
The variable ``p.x`` is different than the variable ``t.x``.


Adding Functions
----------------

Now lets add some printing functions to our ``Point`` class::

   struct Point {
     double x;
     double y;
   
     void print() {
       cout << "(" << x << ", " << y << ")";
     }
   
     void println() {
       print();
       cout << endl;
     }
   }; // Point
   
   int main() {
     Point p;
     p.x = 4;
     p.y = 2;
     p.println();
   }

Look at the ``print`` function. It's body is allowed to use ``x`` and ``y``
directly *without* any dot-notation. Similarly, note that in the ``println``
function we call ``print``. This shows that any function defined inside a
class can access any of the variables or functions also defined within it.

.. note:: Sometimes functions defined inside a class are called
   *methods*. However, we will usually just refer to them as functions.


Constructors
------------

One problem with ``Point`` objects is that they don't initialize their
``x`` and ``y`` values to anything sensible::

   Point p;
   p.println();   // prints unknown values

It's up to the programmer to remember to give them initial values, e.g.::

  Point p;
  p.x = 0;
  p.y = 0;

The problem with this is that it is inconvenient to have to write two
assignment statements every time you create a point. It is to forget, or to do
incorrectly.

A better approach is to use  *constructors* to initialize our ``Point``\ s. A
constructor is a special function designed specifically for intializing
objects. For example::

   struct Point {
     double x;
     double y;
   
     Point() : x(0), y(0) {    // default constructor
       // empty body 
     }
   
     // ...
   
   }; // Point

In C++, a constructor always has the same name as the class it resides in. The
constructor we've added here does not take any input, and so is used like
this::

   Point p;
   p.println();   // prints (0, 0)

When ``p`` is created, the default constructor for ``Point`` is automatically
called, which sets the values of ``x`` and ``y`` to 0.

This is extremely useful: now there is no way to create a ``Point`` without
initializing its variables. Errors due to random initialization values are no
longer an issue.

Constructors look like functions, but have some important differences you need
to know:

- Constructors do *not* have a return type, not even ``void``.

- The name of a constructor is always the name of the class.

- After the constructor's input parameter list comes an *initializer
  list*::

     Point()        // default constructor
     : x(0), y(0)   // initializer list
     {    
       // empty body 
     }

  The purpose of the initializer list is to assign values to the variables of
  the object *before* any other code is executed. You can also put whatever
  code you like inside the code block if any further initialization is
  required.

Classes often have more than one constructor. Lets add a second constructor to
``Point`` that lets the programmer provide ``x`` and ``y`` values::

   struct Point {
     double x;
     double y;
   
     Point() : x(0), y(0) {   // default constructor
       // empty body 
     }
   
     Point(int a, int b) : x(a), y(b) {   // constructor
       // empty body 
     }
   
     // ...  
   
   }; // Point

Now we can write code like this::

   Point p(4, 2);
   p.println();  // prints (4, 2)

Notice the notation for how ``p`` is initialized. We pass in 4 and 2,
and they get assigned to ``p.x`` and ``p.y``.

The other constructor, the default constructor that takes no parameters::

   Point origin;
   origin.println();  // prints (0, 0)

Of course, it is not strictly necessary any more since we could write this::

   Point origin(0, 0);
   origin.println();  // prints (0, 0)

Another useful type of constructor is a *copy constructor*. As the name
suggests, a copy constructor is used to make a copy of another object::

   struct Point {
     double x;
     double y;
   
     Point() : x(0), y(0) {   // default constructor
       // empty body 
     }
   
     Point(int a, int b) : x(a), y(b) {   // constructor
       // empty body 
     }

     Point(const Point& p) : x(p.x), y(p.y) {  // copy constructor
       // empty body
     }
   
     // ...  
   
   }; // Point

The copy constructor lets us write code like this::

   Point start(4, 2);
   Point home(start);  // make a copy of start
   start.println();    // prints (4, 2)
   home.println();     // prints (4, 2)

Keep in mind that ``start`` and ``home`` are separate objects with
their own personal ``x`` and ``y`` variables.


Testing for Equality
--------------------

Another useful function to add to ``Point`` is ``equals``, which tests if two
``Point`` objects are the same::

   struct Point {
     double x;
     double y;
   
     // ...
   
     bool equals(const Point& p) {
       return p.x == x && p.y == y;
     }
   
     // ...

   }; // Point

Recall that ``&&`` means "and", and so the expression ``p.x == x && p.y == y``
returns ``true`` if both ``p.x == x`` is true, and ``p.y == y`` is true. If
one, or both, are false, then the entire expression is false.

Now you can write code like this::

   Point p;       // (0, 0)
   Point target;  // (0, 0)
   
   cin >> target.x >> target.y;  // read in target's value
   
   if (p.equals(target)) {
      cout << "same";
   } else {
      cout << "different";
   }

But there's a problem with ``equals``: ``double`` arithmetic is *not exact*,
i.e. calculations done with ``double``\ s suffer from small but unavoidable
round-off errors. That means that you might have two ``double``\ s that are,
for all practical purposes, equal, but are not exactly the same according to
``==``. For example, in pretty much any practical program we would like to
treat ``0.0`` and ``0.000000000000001`` as being the same.

To solve this problem, lets re-write ``equals``::

   // If the absolute value of the difference of two doubles is the less
   // than min_diff, then they will be considered equal.
   const double min_diff = 0.00000000001;
   
   struct Point {
     // ...
   
     bool equals(const Point& p) {
       return abs(p.x - x) < min_diff && abs(p.y - y) < min_diff;
     }

     // ...
   
   }; // Point

Here, two ``double``\ s, ``x`` and ``y``, are considered the same if the
absolute value of their differences is less than the constant ``min_diff``.
While it takes a little more time to do this equality check, it is more
accurate and probably more useful in general.

.. note:: Dealing with the round-off errors inherent in floating-point
   computer arithmetic turns out to be highly non-trivial in
   general. Long, complicated calculations can suffer from huge
   amounts of error if they are not done carefully.

   Numeric computation is an import sub-topic of computer science, although we
   won't go into it any further here.


Using Points in Other Functions
-------------------------------

We can pass ``Point`` objects to other functions similarly to how built-in
data types are passed. For example::

   // calculates the distance between p and q
   double dist(const Point& p, const Point& q) {
     double dx = p.x - q.x;
     double dy = p.y - q.y;
     return sqrt(dx * dx + dy * dy);
   }

Note that this function is *not* inside the ``Point`` class.


Destructors
-----------

A destructor is a special kind of function that an object calls
*automatically* when it is destroyed, i.e. when it goes out of scope, or is
given back to the free store (using ``delete``).

Our ``Point`` objects don't have any practical need for a destructor, but lets
add one anyways to see how they work::

   struct Point {
 
     // ...
   
     ~Point() {
       cout << "(Point destructor called)\n";
     }
   }; // Point

C++ destructors start with the ``~`` symbol followed by the name of the class.
Destructors *never* take any input parameters, and are usually used for
"cleaning up" resources the object used. In this case, all we are doing is
printing a message when the destructor is called. This can be quite useful for
debugging: it tells you when an object no longer exists.

It's often useful to think of constructors and destructors as working together
to manage some computer resource: the constructor initializes the resource,
and the destructor de-initializes it. Since both constructors and destructors
are automatically called, the programmer will never forget to initialize/de-
initialize the resource.

For instance, suppose you've created an object for a printer::

   struct Printer {
   
      // ...
   
      Printer() {
         printer.open();
      }
   
      // ...
   
      ~Printer() {
          printer.close();
       }
   
   }; 

Now ``Printer`` objects will automatically open and close the printer without
the programmer needing to anything more than create a ``Printer`` object.


Public and Private
------------------

Lets create a new class for representing a person::

   struct Person {
      string name;
      int age;
   
     Person(const string& n, int a) : name(n), age(a) {
         if (age <= 0) error("illegal age");
      }
   
   }; // struct Person

What's interesting here is that the constructor checks to see if the age is
positive. If not, it throws an error. While this stops us from *creating* a
``Person`` object with nonsensical age, it doesn't stop code from later
setting ``age`` to be a bad value::

   Person p("Harry Potter", 14);
   p.age = -5; // oops: age should never be negative!

C++ provides a solution to this problem. Lets re-write the ``Person`` class
like this::

   struct Person {
   private:
      string name;
      int age;
   
   public:
     Person(const string& n, int a) : name(n), age(a) {
         if (age <= 0) error("illegal age");
      }
   
   }; // struct Person

Here we divided the class into two separate regions, one labeled ``private``
and the other labeled ``public``. Now ``name`` and ``age``  *cannot be
accessed outside of ``Person``*. For example, this code now causes a compiler
error::

   Person p("Harry Potter", 14);
   p.age = -5;   // compiler error: age is private

The public part of ``Person`` contains everything we want code outside of
``Person`` to have access to. In this case, the only public thing is the
constructor (otherwise how can you create the object?).

By default, everything in a ``struct`` is ``public``.  C++ also has a
construct called ``class`` which is just like ``struct`` except by default
everything is ``private``. For instance, we could re-write ``Person`` like
this::

   class Person {
      string name;
      int age;
   
   public:
     Person(const string& n, int a) : name(n), age(a) {
         if (age <= 0) error("illegal age");
      }
   
   }; // class Person

We've made two changes here: ``struct`` has been replaced with ``class``, and
the ``private`` label has been removed. We don't need it because in a
``class`` variables and functions are ``private`` by default.


Setters and Getters
-------------------

While making ``age`` private stops us from ever giving it a nonsensical value,
it is too strict: we have no way to make sensible changes to ``age``. Even
worse, we have no way to read ``age``, i.e. this code causes a compiler
error::

   cout << p.age;   // compiler error: age is private

In OOP, the general solution to this problem is to use functions known as
*getters* and *setters*. Roughly, getters return the value of a variable, and
setters write the value of a variable. Since getters and setters are functions
you can add whatever protection code you need or want.

So lets add a setter and getter for ``age`` to ``Person``::

   class Person {
     string name;
     int age;
   
   public:
     Person(const string& n, int a) : name(n), age(a) {
         if (age <= 0) error("illegal age");
      }
   
     int get_age() {                      // getter
       return age;
     }
   
     int set_age(const int a) {           // setter
       if (a <= 0) error("illegal age");
       age = a;
     }
   
   }; // class Person

It's essential that the setters and getters be in the ``public`` part of the
class so that code outside the class can access them.
   
Now to access the age of a ``Person`` we call ``get_age()``::

  Person p("Harry Potter", 14);
  cout << p.get_age() << "\n";

If we try to set the age to a nonsensical value an error is thrown::

   p.set_age(-5);  // error thrown at runtime

But sensible ages cause no error::

   p.set_age(15);   // ok

To be complete, we should also add a setter and getter for the name::

   class Person {
      string name;
      int age;
   
   public:
     Person(const string& n, int a) : name(n), age(a) {
         if (age <= 0) error("illegal age");
      }
   
     int get_age() {                      // getter
       return age;
     }
   
     int set_age(const int a) {           // setter
       if (a <= 0) error("illegal age");
       age = a;
     }
   
     string get_name() {                  // getter
       return name;
     }
   
     void set_name(const string& n) {     // setter
       if (n.empty()) error("illegal name");
       name = n;
     }
   
   }; // class Person
   
Now we can write code like this::

  Person p("Harry Potter", 14);
  cout << p.get_name() << ", " << p.get_age() << "\n";

Setters and getters are important because they give the programmer complete
control over how the variables of an object are accessed. Often objects have
special variables that code outside the object either doesn't need to know
about, or should not be able to change. In such a case the variable should be
declared ``private``, and no setter/getter should be created for it.

This general technique of keeping variables and functions hidden from the rest
of a program is called **information hiding**, and experience shows that it is
a very useful technique for creating large, complex programs. By hiding
implementation details we not only reduce the mental burden on the programmer
using the object, but we also make it hard for them to accidentally "mess it
up" by assigning a variable a nonsensical value.


Constant Objects
----------------

Recall that you can declare a variable to be ``const``, which means it is
read-only. Suppose we do this with a ``Person``::

  const Person baby("Emily", 1);  // error!
  cout << baby.get_name() << ", " << baby.get_age() << "\n";

Unfortunately, when you compile this it doesn't work: you get a compiler error
indicating that you are not allowed to call ``get_name()`` on ``baby`` because
of the ``const``.

In other words, C++ is saying that calling ``get_name()`` might change
``baby`` somehow. We know it won't, but C++ doesn't.

So to get ``const`` to work with ``Person``, we need to indicate which of its
functions can be called with a constant ``Person`` object::

   class Person {
      string name;
      int age;
   
   public:
     Person(const string& n, int a) : name(n), age(a) {
         if (age <= 0) error("illegal age");
      }
   
     int get_age() const {                // const added
       return age;
     }
   
     int set_age(const int a) {           
       if (a <= 0) error("illegal age");
       age = a;
     }
   
     string get_name() const {            // const added
       return name;
     }
   
     void set_name(const string& n) {     
       if (n.empty()) error("illegal name");
       name = n;
     }
   
   }; // class Person

Only ``get_age()`` and ``get_name()`` are declared to be ``const`` because
they are the only functions in ``Person`` that don't change one of its
variables.

After we mark ``get_age()`` and ``get_name()`` as ``const`` this code works as
expected::

  const Person baby("Emily", 1);
  cout << baby.get_name() << ", " << baby.get_age() << "\n";

However, this would (correctly) give a compiler error::

  const Person baby("Emily", 1);
  baby.set_age(2);  // compiler error!
  cout << baby.get_name() << ", " << baby.get_age() << "\n";


Operator Overloading
--------------------

Lets return to our ``Point`` class::

   // If the absolute value of the difference of two doubles is the less
   // than min_diff, then they will be considered equal.
   const double min_diff = 0.00000000001;
   
   struct Point {
     double x;
     double y;
   
     Point() : x(0), y(0) {   // default constructor
       // empty body 
     }
   
     Point(int a, int b) : x(a), y(b) {   // constructor
       // empty body 
     }
   
     Point(const Point& p) : x(p.x), y(p.y) {  // copy constructor
       // empty body
     }
      
     bool equals(const Point& p) {
       return abs(p.x - x) < min_diff && abs(p.y - y) < min_diff;
     }
   
     void print() {
       cout << "(" << x << ", " << y << ")";
     }
   
     void println() {
       print();
       cout << endl;
     }
   
     ~Point() {
       cout << "(Point destructor called)\n";
     }
   }; // Point
   
While this is useful, it is a bit awkward. For instance, writing
``p.equals(q)`` is not as nice as the using the ``==`` operator the way we can
with other C++ values. And using ``print`` and ``println`` is not as
convenient as using the ``cout`` and ``<<``.

To deal with this sort problem C++ lets you *overload* built-in operators to
work with your own objects. For instance, here's how we make ``==`` and
``!==`` work with points::

   struct Point {
   
     // ...
   
     bool operator==(const Point& p) {
       return equals(p);
     }
   
     bool operator!=(const Point& p) {
       return !equals(p);
     }
   
     // ...
   
   }; // Point

This lets us write code like this::

   Point start(4, 2);
   Point home(start);

   if (start == home) {
     cout << "They're the same!\n";
   } else {
     cout << "They're different!\n";
   }   

It makes the code a little easier to read, which is always a good thing in
programming.

Now lets do something about the print functions. To print a ``Point`` directly
onto ``cout`` with ``<<``, we need to overload the ``<<`` operator::

   struct Point {
   
     // ...
   
   }; // Point


   ostream& operator<<(ostream& out, const Point& p) {
    out << "(" << p.x << ", " << p.y << ")";
    return out;
   }
   
The ``<<`` operator is *not* a part of ``Point``, and so is defined outside of
the ``struct``. 

Having a ``<<`` for ``Point`` is quite convenient::

   Point start(4, 2);
   Point home(start);
 
   cout << start << endl << home << endl;
 
Now printing ``Point`` objects works just like printing any other kind of
object.


Putting Point in its Own File
-----------------------------

Points are quite useful in many different kinds of programs, and so it makes
sense to do a little more work to make them easily re-usable.

What we'll do here is put the ``Point`` class, and its related functions and
variables, in a file called ``Point.h``. The ``.h`` indicates this is a
*header* file, which, strictly speaking, should not contain implementation
code but instead just header information. 

Here is the contents of ``Point.h``::

   // Point.h
   
   // By defining point_cmpt125, we avoid problems caused by including
   // this file more than once: if point_cmpt125 is already defined,
   // then the code is *not* included.
   #ifndef point_cmpt125
   #define point_cmpt125 201201L
   
   #include "std_lib_cmpt125.h"
   
   // If the absolute value of the difference of two doubles is the less
   // than min_diff, then they will be considered equal.
   const double min_diff = 0.00000000001;
   
   struct Point {
     double x;
     double y;
   
     Point() : x(0), y(0) {   // default constructor
       // empty body 
     }
   
     Point(int a, int b) : x(a), y(b) {   // constructor
       // empty body 
     }
   
     Point(const Point& p) : x(p.x), y(p.y) {  // copy constructor
       // empty body
     }
   
     bool equals(const Point& p) {
       return abs(p.x - x) < min_diff && abs(p.y - y) < min_diff;
     }
   
     bool operator==(const Point& p) {
       return equals(p);
     }
   
     bool operator!=(const Point& p) {
       return !equals(p);
     }
   
     void print() {
       cout << "(" << x << ", " << y << ")";
     }
   
     void println() {
       print();
       cout << endl;
     }
   
     ~Point() {
       cout << "(Point destructor called)\n";
     }
   }; // Point
   
   ostream& operator<<(ostream& out, const Point& p) {
     out << "(" << p.x << ", " << p.y << ")";
     return out;
   }
   
   double dist(const Point& p, const Point& q) {
     double dx = p.x - q.x;
     double dy = p.y - q.y;
     return sqrt(dx * dx + dy * dy);
   }
   
   #endif

To use it we do this::

   #include "Point.h"
   
   int main() {
      Point p;
      // ...
   }

``#include`` is a pre-processor command that textually includes
``Point.h``, i.e. the ``#include`` statement gets replaced by the
contents of the file.

While ``#include`` is simple and straightforward to understand, it causes a
problem if you try to include the same file more than once, e.g.::

   #include "Point.h"
   
   // ...
   
   #include "Point.h"

Now everything in ``Point.h`` is included two times, and so you get compile-
time errors when you try to run the program.

In large programs consisting of dozens of files, it is surprisingly easy to
accidentally include the same file more than once. It is not always obvious
if, and where, a file has been included.

So to deal with this problem of multiple inclusion, ``Points.h`` uses the
standard trick of defining a unique pre-processor symbol the first time the
file is included::
   
   #ifndef point_cmpt125
   #define point_cmpt125 201201L

   // ... code for Point.h ...

   #endif

``#ifndef`` is a pre-processor command that checks to see if the pre-processor
symbol ``point_cmpt125`` is undefined. If it is undefined, that means this is
the first time the file has been included, and so ``point_cmpt125`` is
immediately defined to be a long integer.

Now if ``Point.h`` is ever included again, ``point_cmpt125`` will be defined,
and the code between ``#ifndef`` and ``#endif`` will not be included.

.. note:: Commands beginning with a ``#`` are pre-processor commands,
   and not C++ commands. The pre-processor is a program that is
   automatically run before compilation that modifies the source file
   texts in some ways. The pre-processor is necessary in C++ for
   things like ``#include``, and this ``#ifndef`` trick, and can also
   be used to do other sorts of things such as creating
   macros. However, over-use of the pre-processor is generally
   considered bad practice (most other modern languages dispense with
   it) in C++ because it is quite primitive: it manipulates the source
   code written by the programmer, and knows little about the rules of
   C++.