Basic Type Checking

Type checking is concerned with ensuring that the operators and operands of an expression are of compatible types (this includes things like calling subprograms and assignment statements).

For example:

string s = "hello!";  // C++
double x = 4.3;

s = x;  // type error caught by compiler
        // can't assign a string to an int

The term compatible type is important, because languages have situations in which different types can be used together if they were the same type. For example:

double a = 2.2;
int x = 5;

a = x;  // okay: no error because an int can be assigned to a double

In this case, the value of x is converted, or coerced, into an equivalent value for type double. For instance:

cout << 2 + 3.3;   // 2 is coerced to 2.0

Here, the + is floating-point addition (not integer addition).

Static and Dynamic Type Checking

Type checking can be done at compile-time or run-time. Compile-time type checking is called static type checking, and type errors result in compiler errors. Run-time type checking is called dynamic type-checking, and in this case errors result in run-time errors.

For instance, consider this Java code:

abstract class Pet {
  abstract void feed();
}

class Dog extends Pet {
  void feed() {
    println("eats kibble");
  }
}

class Cat extends Pet {
  void feed() {
    println("eats a sardine");
  }
}

void takeCareOf(Pet p) {
  println("takeCareOf called");
  Dog d = (Dog) p;  // this cast may fail at run-time
  // ...
}

void mainLoop() {
  takeCareOf(new Dog());  // okay
  takeCareOf(new Cat());  // oops
}

This code compiles even though it contains a type error. The statement takeCareOf(new Cat()) fails because takeCareOf expects a Dog.

In languages with dynamic type binding, some type checking can only be done at compile time because at compile-time not enough information may be known about types to detect errors. Some languages, such as JavaScript and PHP, only provide dynamic type-checking.

Sometimes the term strongly typed is used, although it is often vague and probably not a very useful term. Generally, a language is strongly-typed if it catches all type errors. In this strict sense, most languages are not strongly-type because there are usually ways to get around the type system and so prevent the compiler from catching a type error. While languages like C and C++ area notorious for allowing programmers to subvert their type-system, even a language such as Ada, which is meant to strongly-typed, has a way to allow programmers to purposely get around its type-checking rules.

Type Equivalence

When are two types the same? Obviously, they are the same if they have the same name, e.g. the type string is equivalent to the type string, and different if they have different names, e.g. int and float are different types.

There are two main kinds of type equivalence: name equivalence and structural equivalence.

Name equivalence is simple: two types are considered equivalent if they have the same name. Structural equivalence is more sophisticated: two type are considered equivalent if they share the same structure. Many languages use a mix of both name equivalence and structural equivalence.

Ada is an example of a language that uses strict name equivalence, e.g.:

type Indextype is 1..100;   -- Ada
count : Integer;
index : Indextype;

The types Integer and Indextype are not equivalent, and so you cannot store an Integer in an Indextype variable, and vice-versa. However, at the implementation level, it may well be the case that both types have identical implementations (e.g. 32 bit ints, perhaps).

For example, Go uses name equivalence:

type Age int
type Num int

func main() {
    var a Age = 3
    var n Num = 4
    var i int = 5

    //a = n // type error
    //n = a // type error
    //a = i // type error
    //i = n // type error

    a = 5 // ok!
    //n = i + 5 // type error

    //fmt.Println(a + n) // type error
    //fmt.Println(i + n) // type error
    fmt.Println(n + 2) // ok!
    //fmt.Println(n + 2.1) // error

    fmt.Println(a)
    fmt.Println(n)
    fmt.Println(i)
}

The types Age and Num are implemented exactly the same as an int. However, they are different types, and so Go usually gives type errors if you try to mix them together. As the examples show, in some cases with constant values Go will automatically convert a type to allow an operation.

Many languages let you create types without simple names. For example, int C++ you can could write this statement:

vector<string*> x;

The type vector<string*> is not a simple name, and strict name equivalent rules would force it to have a name, e.g.:

typedef vector<string*> StringSeq;

using StringSeq = vector<string*>;  // C++11 alternative to typedef

This is pretty inconvenient, and so most languages do not use strict name equivalence for types.

Structural type equivalence declares types are equivalent when the two types have the same structure. For instance, in Go we could write this:

type Person struct {       // Go_
    name string
    age int
}

type Cheese struct {
    name string
    age int
}

Under purely structural equivalence rules, Person and Cheese would be considered equivalent. But with name equivalence rules they would be different.

What rules does Go use? Consider this function:

func display(p Person) {
    fmt.Println("name: " + p.name)
    fmt.Println(" age:", p.age)
}

Clearly we can pass a Person object to display. But can we pass it a Cheese? No — a compiler type error results in this code:

thing := Cheese{"hat", 10}
display(thing)  // compiler error: thing type is not Person

This shows that Go does not use purely structural type equivalence rules. C-like languages give the same result as Go.

Structural equivalence does appear in C-like languages. Consider:

typedef int Age;  // C/C++
typedef int Date;

Age a = 4;
Date d = 23;

a = d; // okay

d = a; // okay

The types Age and Date have different names, but they are structurally identical and so can be used interchangeably. Indeed, typedef just creates aliases, i.e. alternative names for existing types. Both Age and Date are really just the type int, and so a typedef is mainly for making code easier for humans to read.

Arithmetic

Arithmetic operators have three main arities:

  • unary, e.g. in the expression -(1 + 2), - is an unary operator because it takes a single input (note also that - has a binary version that takes two operands, e.g. 4 - 5)
  • binary, e.g. in the expression -(1 + 2), + is binary operator because it takes two inputs
  • ternary, e.g. in the C/C++ expression (x > 5) ? 1 : 0, ? : is a ternary operator because it takes exactly three inputs.

Arithmetic expressions can also vary in the order in which operators and value are placed:

  • infix, e.g. in the expression 1 + 2, + is an infix operator
  • prefix, e.g. in the expression + 1 2, + is a prefix expression; Lisp uses prefix notation for expressions
  • postfix, e.g. in the expression 1 2 +, + is a postfix operator; FORTH, and other stack-based languages, use postfix operators

We also need to worry about operator precedence, e.g.:

  • In C-like languages, in the expression 1 + 2 * 4 * is done before +. But in the expression (1 + 2) * 4, + is done before *.

    Note that in Lisp, the order of operations is always explicit, e.g. (+ (* 2 4)) and (* (+ 1 2)).

    The SmallTalk language is one of the few languages that don’t follow the traditional operator precedence rules of arithmetic. In SmallTalk, an expression like 2 + 3 * 5 evaluates to 25 since the operators are evaluated in the order they occur. This is a consistent use of SmallTalk’s function-calling rules, but for many programmers it is a source of errors and annoyance.

  • In an expression like 2 * 3 * 0, a language must decide which * is evaluated first, i.e. is the expression that same as (2 * 3) * 0 or 2 * (3 * 0). In mathematics, there is no difference, but it could matter in computer arithmetic where expressions might over-flow or underflow

    For example, suppose we are doing 8-bit integer arithmetic where the maximum value is 127. Then consider these two expressions:

    (-100 + 100) + 100  // equals 100, no overflow
    
    -100 + (100 + 100)  // equals 100, overflow
    

    Both expressions have the same value in the mathematical sense, and, indeed, in Go (and many other languages) they evaluate to the same thing. But there’s an important difference: the second expression has an overflow because 100 + 100 results in a value bigger than 127. But there is no overflow in the first expression.

Operand Evaluation Order

Consider this Go code:

func one() int {
    fmt.Println("one")
    return 1
}

func two() int {
    fmt.Println("two")
    return 2
}

func add(x, y int) int {
    return x + y
}

func main() {
    a := add(one(), two())  // Is one() evaluated before two()?

    fmt.Println(a)
}

The Go spec promises that in the function call add(one(), two()), one() will be evaluated before two().

Function Side Effects

A function is said to cause a side-effect if it modifies some value outside of the function. For example:

func one() int {
    fmt.Println("one")  // side effect
    return 1
}

Printing is a very useful side effect, but often side effects cause confusing behaviour. For example:

var a int = 5

func f(n int) int {
    a++           // side effect
    return n + 1
}

func main() {
    fmt.Println(a + f(2))  // 9
    fmt.Println(a + f(2))  // 10
}

This shows that a simple side-effect in f causes the a + f(2) to evaluate differently depending on how many times f is called. This can make it very hard to reason about the behaviour of expressions.

Traditional mathematical functions do not have side-effects, and so don’t suffer from this kind of problem. One of the tenets of functional programming is that functions should not have side-effects precisely to avoid this sort of weird behaviour. The problem with this, though, is that it also erases the benefit of useful side-effects like printing to the screen, reading and writing files, or accessing the web.

Referential Transparency

A functions is said to be referentially transparent if the function always returns the same output when given the same input. For instance:

func twice(x int) int {  // a referentially transparent Go function
    return 2 * x
}

The twice function is referentially transparent because if you pass it a, it always return a + 1. It’s value depends only on the input to the function any nothing else.

Here’s a function that’s not referentially transparent:

var count int

func next() int { // a Go function that is not referentially transparent
    count++
    return count
}

Every time you call next(), you get a different result. Thus next() is not referentially transparent.

Another example of a kinds of function that isn’t referentially transparent is any function that, for instance, reads data from a file or from the user. For instance, Python has a function called raw_input(prompt) that prints the string prompt and the copies whatever string the user types next.

Type Conversions

Converting between data types in programming is an important operation, and it is useful to distinguish between, at least, two kinds of conversions among numeric types.

A narrowing conversion is when you convert one type to a “smaller” type. For instance, consider this code:

var x float64 = 8
fmt.Println(float32(x))  // narrowing conversion; prints 8

float32(x) is a narrowing conversion because it is taking a 64-bit value and stuffing it into a 32-bit value. So 32-bits are lost. In this case, all the lost bits are 0s, and so there is no problem, but in general converting from 64-bits to 32-bits loses information and so is unsafe.

A widening conversion is when you convert from one type to a “bigger” type. For example:

var y float32 = 8
fmt.Println(float64(y))  // widening conversion; prints 8

float64(y) is a widening conversion, because it goes from a 32-bit value to a 64-bit one. No bits are lost, and so, for floats, widening conversions don’t change the converted value and so are safe.

Type Coercion

Most programming language have special coercion rules that implicitly convert the types of data in some (but usually not all) circumstances. This is typically done as a convenience for the programmer, as constantly having to specify explicit conversions is tedious and results in code that is hard to read.

For instance, many languages will let you add an integer to a floating point number without complaint, e.g. 5 + 3.2 evaluates to 8.2.

Go has interesting approach to this problem. Consider this code:

fmt.Println(5 + 3.2)   // Go, prints 8.2

5 and 3.2 are different types, and there is no + operator that works for them. So Go automatically converts 5 to a float and returns a float result.

However, this not allowed in Go:

var x int = 5        // Go
var y float32 = 3.2
fmt.Println(x + y) // compile-time type error: x and y are different types

This fails because the type coercion rules in Go only apply to constants (like 3.2), and not to variables (like x).

Short Circuit Evaluation of Boolean Operators

An important detail in the evaluation of boolean expressions in many languages is that they used so-called short-circuit evaluation rules. For example, consider the expression (x < 0) or (x > 1). From the rules of propositional logic, we know it is true just when one, or both, of x < 0 and x > 1 are true. If we actually want to calculate this (as we do in a programming language), then there is small trick: if x < 0 is true then we know the entire expression is true. There is no need to evaluate x > 1 because whether it is true or false, the entire expression is still true.

Similarly, for the expression (x >= 0) and (x <= 1), if x >= 0 happens to be false, then there is no need to evaluate x <= 1 because, regardless of its value, the entire expression must be false.

Most modern languages use short-circuit evaluation, as it is a small and simple optimization. It also allows us to write expressions like this:

if i >= 0 and i < len(lst) and isDigit(lst[i]):   # Python
  # ...

This if-condition first tests that i is in the proper range. If it’s not, then isDigit(lst[i]) is not evaluated thanks to short-circuit evaluation. If Python always did a full evaluation of the expression, then you would get a run-time error in some cases.