Lecture 13 ========== Notes on Object-Oriented Programming ------------------------------------ Object-oriented programming (OOP) was first implemented in the language Simula 67, and was more fully explored in SmallTalk. It is now the major form of abstraction in most programming languages. In these notes we will look at three different approaches to OOP in the languages C++, Python, Go, JavaScript, and Dart. C++ --- C++ use a standard class/object approach to OOP. In C++ source code, you write classes that specify the data and methods for an object of the classes type. For example, here is a class that lets you create object of type ``Person``:: class Person { private: string name; int age; public: Person(const string& n, int a) : name(n), age(a) {} string get_name() const { return name; } int get_age() const { return age; } virtual void print() const { cout << name << " is " << age << " years old.\n"; } }; The ``private:`` and ``public:`` labels divide the functions and variables of ``Person`` into two kinds: the private things that can only be accessed with the class itself, and the public things that can be accessed outside the class. The idea of making data private is to allow the methods in ``Person`` to have complete control over them. Any function in the program can change a public variable, which could lead to bugs or errors due to unintentional changes. Private variables restrict access in a useful way: you'll get a compiler error if you try to access a private variable outside the class. The keyword ``virtual`` at the start of the function header for ``print()`` tells C++ that classes that inherit from ``Person`` are permitted, if they wish, to supply their own version of ``print``. We'll see below how this is used. We can use ``Person`` like this:: Person p("Mary", 67); p.print(); ``Person`` requires that you create a ``Person`` object by supplying a name and age. There's no way to create an uninitialized ``Person`` object, which helps prevent errors. ``get_name()`` and ``get_age()`` are called **getters** because what they do is get the value of some variable in the class. If you write ``p.age`` or ``p.name`` directly because they are private variables, which means such access is forbidden. Inside the ``print`` method *don't* need to call ``get_name()`` and ``get_age()`` because methods in a class are allowed to refer directly to the private variables of that class. Notice that there is no way to change a person's name or age once a ``Person`` object is created. It only has getters, which makes it a read-only object. Whether or not you want a particular class of object to be read-only is a design decision (and in this case it makes little sense, because people's ages change every year, and, occasionally, so too do their names). An important technique in class-based OOP is **inheritance**. Inheritance is a way to create a new class based on some other class. For example:: class Student : public Person { private: string school; public: Student(const string& n, int a, const string& s) : Person(n, a), school(s) { } string get_school() const { return school; } void print() const { cout << get_name() << " is " << get_age() << " years old and attends " << school << ".\n"; } }; We say that ``Student`` **subclasses**, or **extends**, the ``Person`` class. That means that all the data and methods in ``Person`` are automatically put into ``Student``. Now you can write code like this:: Person p("Mary", 67); p.print(); Student s("Barry", 12, "Sun Ray Elementary"); s.print(); Notice that the ``Student`` class defines its own version of ``print``, and so when ``s.print()`` is called, it is the ``Student`` version of ``print`` that is executed. When C++ encounters the statement ``s.print()``, how does it decide what version of ``print`` to call? Since ``s`` is of type ``Student``, it calls the ``print`` associated with ``Student``. C++ can determine this fact at compile- time because the compiler can see everything it needs to know to infer it. But things get more interesting in this example:: vector people = {new Person{"Mary", 67}, new Student{"Barry", 12, "Sun Ray Elementary"} }; for(Person* p : people) { p->print(); // same as (*p).print() } Here, ``people`` is a ``vector`` of pointers to ``Person`` objects. A ``new`` expression, such as ``new Person{"Mary", 67}``, returns a pointer to a newly allocated object. Inside the for-loop, the statement ``p->print()`` is executed. What version of ``print`` is called? The one for ``Person``, or the one for ``Student``? The variable ``p`` is of type ``Person*``, so we know that ``p`` is pointing to either a ``Person`` object or a ``Student`` object. Both of those kinds of objects have a method called ``print()``, and so what code gets executed by ``p->print()``. The answer is that it depends on the type of object ``p`` points to. If ``p`` points to a ``Person`` object, then the ``Person`` ``print()`` function is called. If instead ``p`` points to a ``Student`` object, then the ``Student`` ``print()`` function is called. What's interesting here is that C++ does not know until run-time what type of object ``p`` points to. By looking at the statement ``p->print()`` alone, it is impossible to tell which version of ``print()`` is called. Thus, the compiler, which only has access to the source code, cannot know whether it is the ``Person`` ``print()`` or ``Student`` ``print()`` that will be executed here. Even though we don't know for sure which ``print()`` is called, there is no type error because we do that both ``Person`` objects and ``Student`` objects have a method named ``print()``. So we can be certain that a ``print()`` can be called at that point. Notice also that we *must* call ``get_name()`` and ``get_age()`` inside the ``Student`` class. That's because the variables ``name`` and ``age`` are private, and so cannot be directly accessed outside of the ``Person`` class. Even though ``Student`` inherits those variables from ``Person``, code in student does **not** have the ability to directly access it's private variables. .. warning:: In C++, you *must* write the above code using pointers. The following code runs, but won't work the way we would like:: vector people = {Person{"Mary", 67}, Student{"Barry", 12, "Sun Ray Elementary"} }; for(Person p : people) { p.print(); } Here, the ``people`` vector contains ``Person`` objects instead of pointers to ``Person`` objects. That means when ``p.print()`` is called, the code for the ``Person`` version of ``print`` is executed no matter what because the choice is made based on the type of ``p`` (instead of the type of the object ``p`` points to). Thus, for most practical purpose, OOP in C++ requires that you use pointers to objects instead of objects themselves. The problem with this approach is that it is up to the programmer to remember to use the correct techniques, and, also, to deal with any pointer errors. Some other languages, such as Java, avoid this problem by making all object variables pointers (references). Thus, in Java, you simply cannot make a variable that directly names an object; it is always a pointer. This avoids the above sort of problem.