Abstract Data Types (ADTs)

ADTs are a way of organizing your program code that is highly modular and reusable. We will be using throughout the rest of this course (they are the “walls” aspect of our textbook).

The Definition of an ADT

The definition of ADT we’ll use is as follows:

An abstract data type (ADT) is a collection of data plus the operations on that data. ADTs separate their interface (i.e. the specification of how the programmers use them) from their implementation in a way that allows different implementations to work with the same interface.

Be careful to not to misunderstand how we use the term “interface” here: we don’t mean a graphical interface, but instead the interface to the ADT, i.e. the set of function headers that a programmer using the ADT can use.

Example. The standard int data type, plus operations like +, -, *, and / together form an ADT. You don’t need to know how ints are implemented or how the arithmetic is done in order to use them. The exact details of how ints are actually implement can — and does — vary in practice.

Example. An int on its own, without considering the operations that work on it, is not considered to be an ADT. An int on its own is just a data, and an ADT requires both data and operations that work on that data.

Example. The C++ string type is an ADT because strings combine data (a sequence of characters) and operations that work on them (like size or substr). You don’t need to know anything about the details of how a string object is implemented in order to use it. Instead, you just need to know what functions it supports and how those functions behave, i.e. the very things you will find in good string documentation.

Often, but not always, C++ string objects store their strings as array of characters (i.e. essentially C-style strings) plus some extra bookkeeping information. But there’s no requirement for strings to be implemented that way, and, in fact, in practice you can find alternative implementations (such as ropes).

Example. A C-style array is generally not considered to be an ADT because it just stores data and does not specify the operations that work on it (beyond a few simple, low-level operations such as indexing).

Designing an ADT: A List

Imagine you are writing a program (perhaps a smartphone app) that stores a shopping list like this:

apples
milk
diapers
fruit cups

Just by thinking about how we use a shopping list, we can think of some operations that might be useful to perform on it:

  • Add an item to the list. Usually, we just add items to the end of a list because inserting items written on paper is difficult. However, on a computer inserting an item at any particular position is both possible and convenient for users.
  • Cross an item off the list.
  • Count the number of items on the list (do you need to get lots of things, or only a few things?).

This informal thinking about a list is important and useful, and usually how you start designing an ADT.

The next step is to more formally specify the operations we want our list ADT to support. Now we need to keep in mind some of the realities of our computer and programming language:

  1. Create an empty list.
  2. Destroy a list.
  3. Determine if a list is empty.
  4. Determine the number of items on a list.
  5. Insert an item at a given position on a list.
  6. Delete an item at a given position on a list.
  7. Retrieve an item at a given position on a list.

Keep in mind that this is not a definitive set of list operations (e.g. there is no way to test if a given item is already on the list). It is just the ones we thought to add. One of the challenges of writing good ADTs is coming up with a good list of operations. Usually, we want a small set of operations that work well in lots of different situations.

This list of operations are still informal, but they provide a sort of blue- print for implementing our ADT. We will use this list to determine what operations to implement, and, importantly, to know when we are done.

Here’s a more formal description of the ADT:

  1. createList()

    Creates a new list item. In an OOP language like C++, this is typically implemented as an object constructor.

  2. destroyList()

    Destroys a list. In C++, this would be implemented as a destructor.

  3. bool is_empty() const

    Returns true if this list is empty, and false otherwise.

  4. int size() const

    Returns the number of items on this list.

  5. void insert(const ItemType& item, int index)

    Inserts item at location index of the list. The index of the items already in the list from location index onwards all increase by 1.

    For insertion to be successful, we require that 0 <= index <= size(). If index is less than 0 or greater than size(), then an error is thrown indicating the insertion failed.

    How exactly an error is signaled depends in part on the programming language being used. Most modern programming languages (like C++) use exceptions to signal errors. Some languages, such as C, do not have any special support for error-handling, and so may require an extra error- flag variable be set to indicate success or failure.

    Whatever the details of how exactly errors are signaled and handled, the important thing in an ADT is that errors should be stated explicitly.

  6. ItemType retrieve(int index)

    Returns the item at location index of the list.

    For insertion to be successful, we require that 0 <= index < size() (notice that this expression is asymmetric). If index is less than 0, or index is greater than, or equal to, size(), then an error is thrown.

In C++, we would probably use a class to implement this code. Here’s how we might start the implementation:

    typedef string ItemType;

    class List {
    private:
             // ...

    public:
    // constructor: creates an empty list
    List();

// destructor: destroy the list
    ~List();

// Pre:
// Post: returns true if this list is empty, and false otherwise
    bool is_empty() const;

    // Pre:
    // Post: returns the number of elements in this list
    int size() const;

    // Pre: 0 <= index <= size()
    // Post: Puts item at location index. Every element after
    //       item has its index increased by 1.
    void insert(const ItemType& item, int index);

    // Pre: 0 <= index < size()
    // Post: returns the item at location index
    ItemType retrieve(int index) const;

    }; // class List

Of course, the implementation details of how these function works are left out here: that comes later. When designing an ADT, we always first work out the interface for how it will be used, and then, after that, implement it.

We now know enough about our list ADT to use it in real code, e.g.:

List lst = new List();

lst.insert("apples", 0);
lst.insert("milk", 1);
lst.insert("diapers", 2);
lst.insert("fruit cups", 3);

string first = lst.retrieve(0);
string last = lst.retrieve(1);

Or:

void display(const List& lst) {
        for(int i = 0; i < lst.size(); ++i)     {
                ItemType item = lst.retrieve(i);
                cout << item << ' ';
        }
}

Designing an ADT: A Sorted List

Let’s design an ADT that keeps a sequence of items in sorted order. It’s similar to the List ADT above, except there’s no choice about where to insert an item.

Here is the sorted list ADT:

  • bool is_empty() const

    Returns true if the list is empty, and false otherwise.

  • int size() const

    Returns the number of items in the list.

  • void insert(const ItemType& item)

    Inserts item into the list. Note: It’s not stated what insert should do if the item being inserted is already on the list. It’s not forbidden, and so it is presumably allowed.

  • void remove(const ItemType& item)

    Removes item from the list. Note: If item occurs more than once on this list, then it is not clear which item should be removed (or even if all items should be removed).

  • ItemType retrieve(int pos) const

    Returns the item at location pos such that if pos is 0, then the alphabetically first item is returned; if pos is 1, then the alphabetically second item is returned, etc.

    Requires 0 <= pos < size(). If pos is less than 0, or greater than, or equal to, size(), then an error is thrown.

  • int index_of(const ItemType& item)

    Returns the index location of item in this list, where 0 is the index of the smallest item, 1 is the index of the second smallest item, 2 is the index of the third smallest item, and so on. If the item is not in the list, then -1 is returned.

Designing an ADT: A Person

Suppose we want to store information about people, e.g. their first name, last name, and email address. We can define this as ADT called Person like this:

  • void set_first_name(const string& name) string get_first_name() const

    Set/get the person’s first name. If the name is an empty string, or all whitespace, then an error is thrown.

  • void set_last_name(const string& name) string get_last_name() const

    Set/get the person’s last name. If the name is an empty string, or all whitespace, then an error is thrown.

  • void set_email(const string& email) string get_email() const

    Set/get the person’s email address. The email must be a validly formed email address.

We have not directly specified any data, but instead indirectly defined everything in terms of functions.

You may well ask why we bother to do things this way. It seems like overkill: why not just make a class/struct that stores the data and access it directly?

One big reason is that the ADT approach gives us total control over what happens when we set/get a value. For instance, we can test if a passed-in name is empty, which is something we can’t do so easily if we accessed the data directly.

Another reason is that we don’t necessarily know how the underlying data is being stored. It’s definitely being set/get as strings, but that doesn’t mean the data is stored as strings. It’s possible that a special name ADT is being used to store the names, or perhaps they are being stored in an external SQL database.

Designing an ADT: An Address Book

Suppose you are designing an address book for, say, an email client (i.e. a program an individual would use to send/receive email). For simplicity, we will only store people’s first name, last name, and email address.

What kinds of things would a user likely want to use an address book for? Some reasonable operations might be:

  • add a new person
  • count of the number of people in the address book
  • get the person at location i
  • search for string contained in an address book
  • changing information of a person already in the address book
  • delete a person from the address book

No doubt you can think of many other features, but this small set of features seems useful, and so it is a good place to start. We can add other features later.

Here is a more precise specification:

  • void add_person(const string& email,

    const string& last_name, const string& first_name)

    Adds a person with the given info to the address book. The email address is used as a unique identifier, and so if someone in the address book has the same email address then add_person does nothing.

  • int size() const

    Returns the number of people currently stored in the address book.

  • void change_email(const string& old_email, const string& new_email)

    Changes the email address of the person whose address is currently old_email. If someone else already as the address new_email, then this does nothing. If no one in the address book has old_email as ab address, an error is thrown.

  • Person retrieve(int i)

    Returns, as a Person ADT, the person at location i of the address book. It is required that 0 <= i < size(). If i is less than 0, or equal to or greater than size(), then an error is thrown.

  • void change_first_name(const string& email, const string& new_first_name)

    Changes the first name of the person in the address book with the given email address. If no one in the address book has the given email address, an error is thrown.

  • void change_last_name(const string& email, const string& new_last_name)

    Changes the last name of the person in the address book with the given email address. If no one in the address book has the given email address, an error is thrown.

  • void delete_entry(const string& email)

    Deletes the person in the address book with the given email address. If no one in the address book has the given email address, an error is thrown.

Implementing ADTs in C++

C++ is an object-oriented programming (OOP) language, and so has good support for implementing ADTs. Typically we implement an ADT as a class, and ADT instances are thus objects.

As a reminder of how classes work in C++, lets create an ADT for representing 3-dimensional cubes. We will do this in two steps:

  1. Define the interface to the ADT. This means we will create the class, and list all its variables and function headers (including pre/post conditions when appropriate). This does not include function bodies, i.e. there are almost no implementation details in this part.
  2. Implement the function bodies for all the functions in the just-defined class.

In C++, part 1 is typically placed in a header (.h) file, and part 2 is placed in a corresponding .cpp file (that includes the header). This lets you independently compile the .cpp files, which can save time when building large programs. However, since our focus in this course is on data structures and algorithms we will opt for a simpler approach and put but the interface code and the implementation code in the same .h file.

Here’s how we would create the ADT interface for a cube:

class Cube {
private:
        double side_length;

public:

        // Pre:
        // Post: creates a Cube with side length 1
        Cube(); // default constructor

        // Pre: sl > 0
        // Post: creates a Cube with side length sl
        Cube(double sl);

        // Pre: sl > 0
        // Post: sets the side length of this cube to sl
        void set_side_length(double sl);

        // Pre:
        // Post: returns the side length of this cube
        double get_side_length() const;

        // Pre:
        // Post: returns the surface area of this cube
        double get_surface_area() const;

        // Pre:
        // Post: returns the volume of this cube
        double get_volume() const;

}; // class Cube

In practice, someone using the Cube class should assume that this is the only code they will ever get to see — the implementation code might not be provided! However, it is enough to use the Cube class correctly in any program that needs it.

It’s worth nothing that one implementation detail is visible here: the private variable side_length must be put here. For a class like Cube this is an obvious implementation detail that would probably never cause a problem, but for complex classes showing the user of the class all of its private variables can lead to trouble if the user makes assumptions about the implementation.

The implementation of the functions looks like this:

#include "std_lib_cmpt225.h"

using namespace cmpt225;

// Cube:: is put at the front of all the Cube methods
// so the compiler knows they belong to Cube and are not
// stand-alone functions.

// side_length is initialized using an initializer list
Cube::Cube() : side_length(1.0)
{
        // nothing in the body of this constructor
}

Cube::Cube(double sl) : side_length(sl)
{
        check_pre(side_length > 0);
}

void Cube::set_side_length(double sl)
{
        check_pre(sl > 0);   // check_pre is from std_lib_cmpt225
                             // It throws an error if its input is false,
                             // and does nothing if it is true.
        side_length = sl;
}

double Cube::get_side_length() const
{
        return side_length;
}

double Cube::get_surface_area() const
{
        return 6 * side_length * side_length;
}

double Cube::get_volume() const
{
        return side_length * side_length * side_length;
}

As you can see, we put Cube:: in front of all the function names that belong to the Cube class so that the compiler knows we are referring to a function inside Cube, and not a new stand-alone function.

Here’s the entire file Cube.h:

// Cube.h

// By defining Cube_h, we avoid problems caused by including this file more
// than once: if Cube_h is already defined, then the code is *not* included.
#ifndef Cube_h
#define Cube_h

class Cube {
private:
        double side_length;

public:

        // Pre:
        // Post: creates a Cube with side length 1
        Cube(); // default constructor

        // Pre: sl > 0
        // Post: creates a Cube with side length sl
        Cube(double sl);

        // Pre: sl > 0
        // Post: sets the side length of this cube to sl
        void set_side_length(double sl);

        // Pre:
        // Post: returns the side length of this cube
        double get_side_length() const;

        // Pre:
        // Post: returns the surface area of this cube
        double get_surface_area() const;

        // Pre:
        // Post: returns the volume of this cube
        double get_volume() const;


}; // class Cube

////////////////////////////////////////////////////////////////////////////
//
// Below is the implementations of the Cube class. In practice, this would
// usually be put in its own .cpp file and compiled separately. However,
// we want to simplify building our software in this course and so will
// usually put everything in the .h file.
//
////////////////////////////////////////////////////////////////////////////

#include "std_lib_cmpt225.h"

using namespace cmpt225;

// Cube:: is put at the front of all the Cube methods
// so the compiler knows they belong to Cube and are not
// stand-alone functions.

// side_length is initialized using an initializer list
Cube::Cube() : side_length(1.0)
{
        // nothing in the body of this constructor
}

Cube::Cube(double sl) : side_length(sl)
{
        check_pre(side_length > 0);
}

void Cube::set_side_length(double sl)
{
        check_pre(sl > 0);   // check_pre is from std_lib_cmpt225
                             // It throws an error if its input is false,
                             // and does nothing if it is true.
        side_length = sl;
}

double Cube::get_side_length() const
{
        return side_length;
}

double Cube::get_surface_area() const
{
        return 6 * side_length * side_length;
}

double Cube::get_volume() const
{
        return side_length * side_length * side_length;
}

#endif

Notice that we have wrapped the code for Cube.h in #ifndef / #endif. This is to guard against the problem of multiple inclusion. In a large multi-file program you might include Cube.h multiple times, which would cause “function re- definition” errors. To avoid that, we use this standard trick of defining a pre-processor symbol the first time the file is included, and then do not include the file if that symbol is defined.

To use Cube, you create a new file, say Cube_test.cpp, like this:

// Cube_test.cpp

#include "Cube.h"

using namespace cmpt225;

int main() {
        Cube cube;  // cube of side length 1.0
        cout << " Side length: " << cube.get_side_length() << endl
             << "Surface area: " << cube.get_surface_area() << endl
             << "      Volume: " << cube.get_volume() << endl;

        cout << "\nPlease enter new side length: ";
        double s;
        cin >> s;

        cube.set_side_length(s);
        cout << " Side length: " << cube.get_side_length() << endl
             << "Surface area: " << cube.get_surface_area() << endl
             << "      Volume: " << cube.get_volume() << endl;
} // main

Array Implementation of the List ADT

Lets use an array to implement the List ADT from above:

// List_arr.h

#include "std_lib_cmpt225.h"

using namespace cmpt225;

const int MAX_LIST_SIZE = 100;
typedef string ItemType;

class List {
private:
        ItemType* arr;
        int sz;

public:
        // constructor: creates an empty list
        List();

  // destructor: destroy the list
        ~List();

  // Pre:
  // Post: returns true if this list is empty, and false otherwise
        bool is_empty() const;

        // Pre:
        // Post: returns the number of elements in this list
        int size() const;

        // Pre: 0 <= index <= size()
        // Post: Puts item at location index. Every element after
        //       item has its index increased by 1.
        void insert(const ItemType& item, int index);

        // Pre: 0 <= index < size()
        // Post: returns the item at location index
        ItemType retrieve(int index) const;

}; // class List


////////////////////////////////////////////////////////////////////
//
// Implementation
//
////////////////////////////////////////////////////////////////////

List::List()
        : arr(new ItemType[MAX_LIST_SIZE]), sz(0)
        { }

List::~List() {
        delete[] arr;
}

int List::size() const {
        return sz;
}

bool List::is_empty() const {
        return size() == 0;
}

// Pre: 0 <= index < size()
// Post: returns the item at location index
ItemType List::retrieve(int index) const {
        check_pre(0 <= index && index < size());
        return arr[index];
}

// Pre: 0 <= index <= size()
// Post: Puts item at location index. Every element after
//       item has its index increased by 1.
void List::insert(const ItemType& item, int index) {
        check_pre(0 <= index && index <= size());
        eassert(size() < MAX_LIST_SIZE);
        ++sz;
        int i = 0;
        while (sz - i > index) {
                arr[sz - i] = arr[sz - i - 1];
                ++i;
        }
        arr[index] = item;
}

Note the following:

  • C++ arrays have a fixed size, and so we use the constant MAX_LIST_SIZE to indicate the size of the underlying array. So this List implementation always reserves exactly MAX_LIST_SIZE element, even for an empty list. If you try to insert an element into a list that contains MAX_LIST_SIZE elements, then an error is thrown.

    A more sophisticated implementation of List could dynamically re-size the array when it runs out of space. For instance, whenever insert is called on a completely filled list it could create a new array that is double the size of the existing array, copy all the elements from arr into it, and then delete the original array. It still uses excess space, but often not as much as our naive implementation.

  • check_pre(expr) is in the std_lib_cmpt225.h file, and does nothing if expr is true (meaning that the pre-condition is true, i.e. that it has been satisfied). But if expr is false, then check_pre immediately throws an error.

  • The implementation of insert is different in the details than the one given in the textbook.

  • Notice the performance of each of the functions:

    • size, is_empty, and retrieve are each constant-time operations. That means that the time they take to run don’t depend on the size of the list. So no matter how large the list, these three operations take the same amount of time.
    • insert runs at different speeds depending upon the parameter index that is passed-in. If i is big, then insert runs quickly; but if i is small, it runs more slowly. In the worst case, when i is 0, insert does an amount of work proportional to the size of the list.