Lecture 9: Type Checking

Type checking is concerned with ensuring that the operators and operands of an expression are of compatible types (this includes things like calling subprograms and assignment statements).

For example:

string s = "hello!";  // C++
double x = 4.3;

s = x;  // type error caught by compiler
        // can't assign a string to an int

The term compatible type is important, because languages have situations in which different types can be used together if they were the same type. For example:

double a = 2.2;
int x = 5;

a = x;  // okay: no error because an int can be assigned to a double

In this case, the value of x is converted, or coerced, into an equivalent value for type double. For instance:

cout << 2 + 3.3;   // 2 is coerced to 2.0

Here, the + is floating-point addition (not integer addition).

Static and Dynamic Type Checking

Type checking can be done at compile-time or run-time. Compile-time type checking is called static type checking, and type errors result in compiler errors. Run-time type checking is called dynamic type-checking, and in this case errors result in run-time errors.

For instance, consider this Java code:

abstract class Pet {
  abstract void feed();
}

class Dog extends Pet {
  void feed() {
    println("eats kibble");
  }
}

class Cat extends Pet {
  void feed() {
    println("eats a sardine");
  }
}

void takeCareOf(Pet p) {
  println("takeCareOf called");
  Dog d = (Dog) p;  // this cast may fail at run-time
  // ...
}

void mainLoop() {
  takeCareOf(new Dog());  // okay
  takeCareOf(new Cat());  // oops
}

This code compiles even though it contains a type error. The statement takeCareOf(new Cat()) fails because takeCareOf expects a Dog.

In languages with dynamic type binding, some type checking can only be done at compile time because at compile-time not enough information may be known about types to detect errors. Some languages, such as JavaScript and PHP, only provide dynamic type-checking.

Sometimes the term strongly typed is used, although it is often vague and probably not a very useful term. Generally, a language is strongly-typed if it catches all type errors. In this strict sense, most languages are not strongly-type because there are usually ways to get around the type system and so prevent the compiler from catching a type error. While languages like C and C++ area notorious for allowing programmers to subvert their type-system, even a language such as Ada, which is meant to strongly-typed, has a way to allow programmers to purposely get around its type-checking rules.

Type Equivalence

When are two types the same? Obviously, they are the same if they have the same name, e.g. the type string is equivalent to the type string, and different if they have different names, e.g. int and float are different types.

There are two main kinds of type equivalence: name equivalence and structural equivalence.

Name equivalence is simple: two types are considered equivalent if they have the same name. Structural equivalence is more sophisticated: two type are considered equivalent if they share the same structure. Many languages use a mix of both name equivalence and structural equivalence.

Ada is an example of a language that uses strict name equivalence, e.g.:

type Indextype is 1..100;   -- Ada
count : Integer;
index : Indextype;

The types Integer and Indextype are not equivalent, and so you cannot store an Integer in an Indextype variable, and vice-versa. However, at the implementation level, it may well be the case that both types have identical implementations (e.g. 32 bit ints, perhaps).

For example, Go uses name equivalence:

type Age int
type Num int

func main() {
        var a Age = 3
        var n Num = 4
        var i int = 5

        //a = n // type error
        //n = a // type error
        //a = i // type error
        //i = n // type error

        a = 5 // ok!
        //n = i + 5 // type error

        //fmt.Println(a + n) // type error
        //fmt.Println(i + n) // type error
        fmt.Println(n + 2) // ok!
        //fmt.Println(n + 2.1) // error

        fmt.Println(a)
        fmt.Println(n)
        fmt.Println(i)
}

The types Age and Num are implemented exactly the same as an int. However, they are different types, and so Go usually gives type errors if you try to mix them together. As the examples show, in some cases with constant values Go will automatically convert a type to allow an operation.

Many languages let you create types without simple names. For example, int C++ you can could write this statement:

vector<string*> x;

The type vector<string*> is not a simple name, and strict name equivalent rules would force it to have a name, e.g.:

typedef vector<string*> StringSeq;

using StringSeq = vector<string*>;  // C++11 alternative to typedef

This is pretty inconvenient, and so most languages do not use strict name equivalence for types.

Structural type equivalence declares types are equivalent when the two types have the same structure. For instance, in Go we could write this:

type Person struct {       // Go_
        name string
        age int
}

type Cheese struct {
        name string
        age int
}

Under purely structural equivalence rules, Person and Cheese would be considered equivalent. But with name equivalence rules they would be different.

What rules does Go use? Consider this function:

func display(p Person) {
        fmt.Println("name: " + p.name)
        fmt.Println(" age:", p.age)
}

Clearly we can pass a Person object to display. But can we pass it a Cheese? No — a compiler type error results in this code:

thing := Cheese{"hat", 10}
display(thing)  // compiler error: thing type is not Person

This shows that Go does not use purely structural type equivalence rules. C-like languages give the same result as Go.

Structural equivalence does appear in C-like languages. Consider:

typedef int Age;  // C/C++
typedef int Date;

Age a = 4;
Date d = 23;

a = d; // okay

d = a; // okay

The types Age and Date have different names, but they are structurally identical and so can be used interchangeably. Indeed, typedef just creates aliases, i.e. alternative names for existing types. Both Age and Date are really just the type int, and so a typedef is mainly for making code easier for humans to read.

Arithmetic

Arithmetic operators have three main arities:

  • unary, e.g. in the expression -(1 + 2), - is an unary operator because it takes a single input (note also that - has a binary version that takes two operands, e.g. 4 - 5)
  • binary, e.g. in the expression -(1 + 2), + is binary operator because it takes two inputs
  • ternary, e.g. in the C/C++ expression (x > 5) ? 1 : 0, ? : is a ternary operator because it takes exactly three inputs.

Arithmetic expressions can also vary in the order in which operators and value are placed:

  • infix, e.g. in the expression 1 + 2, + is an infix operator
  • prefix, e.g. in the expression + 1 2, + is a prefix expression; Lisp uses prefix notation for expressions
  • postfix, e.g. in the expression 1 2 +, + is a postfix operator; FORTH, and other stack-based languages, use postfix operators

We also need to worry about operator precedence, e.g.:

  • In C-like languages, in the expression 1 + 2 * 4 * is done before +. But in the expression (1 + 2) * 4, + is done before *.

    Note that in Lisp, the order of operations is always explicit, e.g. (+ (* 2 4)) and (* (+ 1 2)).

    The SmallTalk language is one of the few languages that don’t follow the traditional operator precedence rules of arithmetic. In SmallTalk, an expression like 2 + 3 * 5 evaluates to 25 since the operators are evaluated in the order they occur. This is a consistent use of SmallTalk’s function-calling rules, but for many programmers it is a source of errors and annoyance.

  • In an expression like 2 * 3 * 0, a language must decide which * is evaluated first, i.e. is the expression that same as (2 * 3) * 0 or 2 * (3 * 0). In mathematics, there is no difference, but it could matter in computer arithmetic where expressions might over-flow or underflow

    For example, suppose we are doing 8-bit integer arithmetic where the maximum value is 127. Then consider these two expressions:

    (-100 + 100) + 100  // equals 100, no overflow
    
    -100 + (100 + 100)  // equals 100, overflow
    

    Both expressions have the same value in the mathematical sense, and, indeed, in Go (and many other languages) they evaluate to the same thing. But there’s an important difference: the second expression has an overflow because 100 + 100 results in a value bigger than 127. But there is no overflow in the first expression.

Operand Evaluation Order

Consider this Go code:

func one() int {
        fmt.Println("one")
        return 1
}

func two() int {
        fmt.Println("two")
        return 2
}

func add(x, y int) int {
        return x + y
}

func main() {
        a := add(one(), two())  // Is one() evaluated before two()?

        fmt.Println(a)
}

The Go spec promises that in the function call add(one(), two()), one() will be evaluated before two().

Function Side Effects

A function is said to cause a side-effect if it modifies some value outside of the function. For example:

func one() int {
        fmt.Println("one")  // side effect
        return 1
}

Printing is a very useful side effect, but often side effects cause confusing behaviour. For example:

var a int = 5

func f(n int) int {
        a++           // side effect
        return n + 1
}

func main() {
        fmt.Println(a + f(2))  // 9
        fmt.Println(a + f(2))  // 10
}

This shows that a simple side-effect in f causes the a + f(2) to evaluate differently depending on how many times f is called. This can make it very hard to reason about the behaviour of expressions.

Traditional mathematical functions do not have side-effects, and so don’t suffer from this kind of problem. One of the tenets of functional programming is that functions should not have side-effects precisely to avoid this sort of weird behaviour. The problem with this, though, is that it also erases the benefit of useful side-effects like printing to the screen, reading and writing files, or accessing the web.

Referential Transparency

A functions is said to be referentially transparent if the function always returns the same output when given the same input. For instance:

func twice(x int) int {  // a referentially transparent Go function
        return 2 * x
}

The twice function is referentially transparent because if you pass it a, it always return a + 1. It’s value depends only on the input to the function any nothing else.

Here’s a function that’s not referentially transparent:

var count int

func next() int { // a Go function that is not referentially transparent
        count++
        return count
}

Every time you call next(), you get a different result. Thus next() is not referentially transparent.

Another example of a kinds of function that isn’t referentially transparent is any function that, for instance, reads data from a file or from the user. For instance, Python has a function called raw_input(prompt) that prints the string prompt and the copies whatever string the user types next.