Lecture 13
==========

Notes on Object-Oriented Programming
------------------------------------

Object-oriented programming (OOP) was first implemented in the language Simula
67, and was more fully explored in SmallTalk. It is now the major form of
abstraction in most programming languages.

In these notes we will look at three different approaches to OOP in the
languages C++, Python, Go, JavaScript, and Dart.


C++
---

C++ use a standard class/object approach to OOP. In C++ source code, you write
classes that specify the data and methods for an object of the classes type. 

For example, here is a class that lets you create object of type ``Person``::

    class Person {
    private:
        string name;
        int age;

    public:
        Person(const string& n, int a) : name(n), age(a) {}

        string get_name() const { return name; }
        int get_age() const { return age; }

        virtual void print() const {
            cout << name << " is " 
                 << age << " years old.\n";
        }
    };

The ``private:`` and ``public:`` labels divide the functions and variables of
``Person`` into two kinds: the private things that can only be accessed with
the class itself, and the public things that can be accessed outside the
class.

The idea of making data private is to allow the methods in ``Person`` to have
complete control over them. Any function in the program can change a public
variable, which could lead to bugs or errors due to unintentional changes.
Private variables restrict access in a useful way: you'll get a compiler error
if you try to access a private variable outside the class.

The keyword ``virtual`` at the start of the function header for ``print()``
tells C++ that classes that inherit from ``Person`` are permitted, if they
wish, to supply their own version of ``print``. We'll see below how this is
used.

We can use ``Person`` like this::

    Person p("Mary", 67);
    p.print();

``Person`` requires that you create a ``Person`` object by supplying a name
and age. There's no way to create an uninitialized ``Person`` object, which
helps prevent errors.

``get_name()`` and ``get_age()`` are called **getters** because what they do
is get the value of some variable in the class. If you write ``p.age`` or
``p.name`` directly because they are private variables, which means such
access is forbidden.

Inside the ``print`` method *don't* need to call ``get_name()`` and
``get_age()`` because methods in a class are allowed to refer directly to the
private variables of that class.

Notice that there is no way to change a person's name or age once a ``Person``
object is created. It only has getters, which makes it a read-only object.
Whether or not you want a particular class of object to be read-only is a
design decision (and in this case it makes little sense, because people's ages
change every year, and, occasionally, so too do their names).

An important technique in class-based OOP is **inheritance**. Inheritance is a
way to create a new class based on some other class. For example::

    class Student : public Person {
    private:
        string school;
    public:
        Student(const string& n, int a, const string& s)
        : Person(n, a), school(s)
        { }

        string get_school() const { return school; }

        void print() const {
            cout << get_name() << " is " 
                 << get_age() << " years old and attends "
                 << school << ".\n";
         }
    };

We say that ``Student`` **subclasses**, or **extends**, the ``Person`` class. 
That means that all the data and methods in ``Person`` are automatically put into 
``Student``.

Now you can write code like this::

    Person p("Mary", 67);
    p.print();

    Student s("Barry", 12, "Sun Ray Elementary");
    s.print();

Notice that the ``Student`` class defines its own version of ``print``, and so
when ``s.print()`` is called, it is the ``Student`` version of ``print`` that
is executed.

When C++ encounters the statement ``s.print()``, how does it decide what
version of ``print`` to call? Since ``s`` is of type ``Student``, it calls the
``print`` associated with ``Student``. C++ can determine this fact at compile-
time because the compiler can see everything it needs to know to infer it.

But things get more interesting in this example::

    vector<Person*> people = {new Person{"Mary", 67}, 
                              new Student{"Barry", 12, "Sun Ray Elementary"}
                             };
    for(Person* p : people) {
        p->print();   // same as (*p).print()
    }

Here, ``people`` is a ``vector`` of pointers to ``Person`` objects. A ``new``
expression, such as ``new Person{"Mary", 67}``, returns a pointer to a newly
allocated object.

Inside the for-loop, the statement ``p->print()`` is executed. What version of
``print`` is called? The one for ``Person``, or the one for ``Student``? The
variable ``p`` is of type ``Person*``, so we know that ``p`` is pointing to
either a ``Person`` object or a ``Student`` object. Both of those kinds of
objects have a method called ``print()``, and so what code gets executed by
``p->print()``.

The answer is that it depends on the type of object ``p`` points to. If ``p``
points to a ``Person`` object, then the ``Person`` ``print()`` function is
called. If instead ``p`` points to a ``Student`` object, then the ``Student``
``print()`` function is called.

What's interesting here is that C++ does not know until run-time what type of
object ``p`` points to. By looking at the statement ``p->print()`` alone, it
is impossible to tell which version of ``print()`` is called. Thus, the
compiler, which only has access to the source code, cannot know whether it is
the ``Person`` ``print()`` or ``Student`` ``print()`` that will be executed
here.

Even though we don't know for sure which ``print()`` is called, there is no
type error because we do that both ``Person`` objects and ``Student`` objects
have a method named ``print()``. So we can be certain that a ``print()`` can
be called at that point.

Notice also that we *must* call ``get_name()`` and ``get_age()`` inside the
``Student`` class. That's because the variables ``name`` and ``age`` are
private, and so cannot be directly accessed outside of the ``Person`` class.
Even though ``Student`` inherits those variables from ``Person``, code in
student does **not** have the ability to directly access it's private
variables.

.. warning::

   In C++, you *must* write the above code using pointers. The following
   code runs, but won't work the way we would like::

      vector<Person> people = {Person{"Mary", 67}, 
                               Student{"Barry", 12, "Sun Ray Elementary"}
                              };
      for(Person p : people) {
          p.print();
      }

   Here, the ``people`` vector contains ``Person`` objects instead of pointers
   to ``Person`` objects. That means when ``p.print()`` is called, the code
   for the ``Person`` version of ``print`` is executed no matter what because
   the choice is made based on the type of ``p`` (instead of the type of the
   object ``p`` points to).

   Thus, for most practical purpose, OOP in C++ requires that you use pointers
   to objects instead of objects themselves. The problem with this approach is
   that it is up to the programmer to remember to use the correct techniques,
   and, also, to deal with any pointer errors.

   Some other languages, such as Java, avoid this problem by making all object
   variables pointers (references). Thus, in Java, you simply cannot make a
   variable that directly names an object; it is always a pointer. This avoids
   the above sort of problem.