Basic Type Checking
===================

Type checking is concerned with ensuring that the operators and operands of an
expression are of compatible types (this includes things like calling
subprograms and assignment statements).

For example::

    string s = "hello!";  // C++
    double x = 4.3;

    s = x;  // type error caught by compiler
            // can't assign a string to an int

The term compatible type is important, because languages have situations in
which different types can be used together if they were the same type. For
example::

    double a = 2.2;
    int x = 5;

    a = x;  // okay: no error because an int can be assigned to a double

In this case, the value of ``x`` is converted, or coerced, into an equivalent
value for type ``double``. For instance::

    cout << 2 + 3.3;   // 2 is coerced to 2.0

Here, the ``+`` is floating-point addition (not integer addition).


Static and Dynamic Type Checking
--------------------------------

Type checking can be done at compile-time or run-time. Compile-time type
checking is called **static type checking**, and type errors result in
compiler errors. Run-time type checking is called **dynamic type-checking**,
and in this case errors result in run-time errors.

For instance, consider this Java code::

    abstract class Pet {
      abstract void feed();
    }

    class Dog extends Pet {
      void feed() {
        println("eats kibble");
      }
    }

    class Cat extends Pet {
      void feed() {
        println("eats a sardine");
      }
    }

    void takeCareOf(Pet p) {
      println("takeCareOf called");
      Dog d = (Dog) p;  // this cast may fail at run-time
      // ...
    }

    void mainLoop() {
      takeCareOf(new Dog());  // okay
      takeCareOf(new Cat());  // oops
    }

This code compiles even though it contains a type error. The statement
``takeCareOf(new Cat())`` fails because ``takeCareOf`` expects a ``Dog``.

In languages with dynamic type binding, some type checking can only be done at
compile time because at compile-time not enough information may be known about
types to detect errors. Some languages, such as JavaScript and PHP, only
provide dynamic type-checking.

Sometimes the term **strongly typed** is used, although it is often vague and
probably not a very useful term. Generally, a language is strongly-typed if it
catches all type errors. In this strict sense, most languages are not
strongly-type because there are usually ways to get around the type system and
so prevent the compiler from catching a type error. While languages like C and
C++ area notorious for allowing programmers to subvert their type-system, even
a language such as Ada, which is meant to strongly-typed, has a way to allow
programmers to purposely get around its type-checking rules.


Type Equivalence
----------------

When are two types the same? Obviously, they are the same if they have the
same name, e.g. the type ``string`` is equivalent to the type ``string``, and
different if they have different names, e.g. ``int`` and ``float`` are
different types.

There are two main kinds of type equivalence: **name equivalence** and
**structural equivalence**. 

Name equivalence is simple: two types are considered equivalent if they have
the same name. Structural equivalence is more sophisticated: two type are
considered equivalent if they share the same structure. Many languages use a
mix of both name equivalence and structural equivalence.

Ada is an example of a language that uses strict name equivalence, e.g.::

    type Indextype is 1..100;   -- Ada
    count : Integer;
    index : Indextype;

The types ``Integer`` and ``Indextype`` are *not* equivalent, and so you
cannot store an ``Integer`` in an ``Indextype`` variable, and vice-versa.
However, at the implementation level, it may well be the case that both types
have identical implementations (e.g. 32 bit ints, perhaps).

For example, Go_ uses name equivalence::

    type Age int
    type Num int

    func main() {
        var a Age = 3
        var n Num = 4
        var i int = 5

        //a = n // type error
        //n = a // type error
        //a = i // type error
        //i = n // type error
        
        a = 5 // ok!
        //n = i + 5 // type error
        
        //fmt.Println(a + n) // type error
        //fmt.Println(i + n) // type error
        fmt.Println(n + 2) // ok!
        //fmt.Println(n + 2.1) // error

        fmt.Println(a)
        fmt.Println(n)
        fmt.Println(i)
    }

The types ``Age`` and ``Num`` are *implemented* exactly the same as an
``int``. However, they are different types, and so Go_ usually gives type
errors if you try to mix them together. As the examples show, in some cases
with constant values Go_ will automatically convert a type to allow an
operation.

Many languages let you create types without simple names. For example, int C++
you can could write this statement::

    vector<string*> x;

The type ``vector<string*>`` is not a simple name, and strict name equivalent
rules would force it to have a name, e.g.::

    typedef vector<string*> StringSeq; 

    using StringSeq = vector<string*>;  // C++11 alternative to typedef

This is pretty inconvenient, and so most languages do not use strict name
equivalence for types.

Structural type equivalence declares types are equivalent when the two types
have the same structure. For instance, in Go_ we could write this::

    type Person struct {       // Go_
        name string  
        age int
    }

    type Cheese struct {
        name string
        age int
    }

Under purely structural equivalence rules, ``Person`` and ``Cheese`` would be
considered equivalent. But with name equivalence rules they would be
different.

What rules does Go_ use? Consider this function::

    func display(p Person) {
        fmt.Println("name: " + p.name)
        fmt.Println(" age:", p.age)
    }

Clearly we can pass a ``Person`` object to ``display``. But can we pass it a
``Cheese``? No --- a compiler type error results in this code::

    thing := Cheese{"hat", 10}
    display(thing)  // compiler error: thing type is not Person

This shows that Go_ does *not* use purely structural type equivalence rules. 
C-like languages give the same result as Go_.

Structural equivalence does appear in C-like languages. Consider::

    typedef int Age;  // C/C++
    typedef int Date;

    Age a = 4;
    Date d = 23;

    a = d; // okay

    d = a; // okay

The types ``Age`` and ``Date`` have different names, but they are structurally
identical and so can be used interchangeably. Indeed, ``typedef`` just creates
aliases, i.e. alternative names for existing types. Both ``Age`` and ``Date``
are really just the type ``int``, and so a ``typedef`` is mainly for making
code easier for humans to read.


Arithmetic
----------

Arithmetic operators have three main arities:

- **unary**, e.g. in the expression ``-(1 + 2)``, ``-`` is an unary operator
  because it takes a single input (note also that ``-`` has a binary version
  that takes two operands, e.g. ``4 - 5``)

- **binary**, e.g. in the expression ``-(1 + 2)``, ``+`` is binary operator
  because it takes two inputs

- **ternary**, e.g. in the C/C++ expression ``(x > 5) ? 1 : 0``, ``? :`` is a
  ternary operator because it takes exactly three inputs.

Arithmetic expressions can also vary in the order in which operators and value
are placed:

- **infix**, e.g. in the expression ``1 + 2``, ``+`` is an infix operator

- **prefix**, e.g. in the expression ``+ 1 2``, ``+`` is a prefix expression;
  Lisp uses prefix notation for expressions

- **postfix**, e.g. in the expression ``1 2 +``, ``+`` is a postfix operator;
  FORTH, and other stack-based languages, use postfix operators

We also need to worry about operator precedence, e.g.:

- In C-like languages, in the expression ``1 + 2 * 4`` ``*`` is done before
  ``+``. But in the expression ``(1 + 2) * 4``, ``+`` is done before ``*``.

  Note that in Lisp, the order of operations is always explicit, e.g. ``(+ (*
  2 4))`` and ``(* (+ 1 2))``.

  The SmallTalk language is one of the few languages that don't follow the
  traditional operator precedence rules of arithmetic. In SmallTalk, an
  expression like ``2 + 3 * 5`` evaluates to 25 since the operators are
  evaluated in the order they occur. This is a consistent use of SmallTalk's
  function-calling rules, but for many programmers it is a source of errors
  and annoyance.

- In an expression like ``2 * 3 * 0``, a language must decide which ``*`` is
  evaluated first, i.e. is the expression that same as ``(2 * 3) * 0`` or ``2
  * (3 * 0)``. In mathematics, there is no difference, but it could matter in
  computer arithmetic where expressions might over-flow or underflow

  For example, suppose we are doing 8-bit integer arithmetic where the maximum
  value is 127. Then consider these two expressions::

    (-100 + 100) + 100  // equals 100, no overflow

    -100 + (100 + 100)  // equals 100, overflow

  Both expressions have the same value in the mathematical sense, and, indeed,
  in Go_ (and many other languages) they evaluate to the same thing. But
  there's an important difference: the second expression has an *overflow*
  because ``100 + 100`` results in a value bigger than 127. But there is no
  overflow in the first expression.


Operand Evaluation Order
------------------------

Consider this Go_ code::

    func one() int {
        fmt.Println("one")
        return 1
    }

    func two() int {
        fmt.Println("two")
        return 2
    }

    func add(x, y int) int {
        return x + y
    }

    func main() {
        a := add(one(), two())  // Is one() evaluated before two()?

        fmt.Println(a)
    }

`The Go spec <http://golang.org/ref/spec#Order_of_evaluation>`_ promises that
in the function call ``add(one(), two())``, ``one()`` will be evaluated before
``two()``.


Function Side Effects
---------------------

A function is said to cause a **side-effect** if it modifies some value
outside of the function. For example::

    func one() int {
        fmt.Println("one")  // side effect
        return 1
    }

Printing is a very useful side effect, but often side effects cause confusing
behaviour. For example::

    var a int = 5

    func f(n int) int {
        a++           // side effect
        return n + 1
    }

    func main() {
        fmt.Println(a + f(2))  // 9
        fmt.Println(a + f(2))  // 10
    }

This shows that a simple side-effect in ``f`` causes the ``a + f(2)`` to
evaluate differently depending on how many times ``f`` is called.  This can
make it very hard to reason about the behaviour of expressions.

Traditional mathematical functions do not have side-effects, and so don't
suffer from this kind of problem. One of the tenets of functional programming
is that functions should not have side-effects precisely to avoid this sort of
weird behaviour. The problem with this, though, is that it also erases the
benefit of useful side-effects like printing to the screen, reading and
writing files, or accessing the web.


Referential Transparency
------------------------

A functions is said to be **referentially transparent** if the function always
returns the same output when given the same input. For instance::

    func twice(x int) int {  // a referentially transparent Go function
        return 2 * x
    }

The ``twice`` function is referentially transparent because if you pass it
``a``, it always return ``a + 1``. It's value depends only on the input to the
function any nothing else.

Here's a function that's not referentially transparent::

    var count int

    func next() int { // a Go function that is not referentially transparent 
        count++
        return count
    }

Every time you call ``next()``, you get a different result. Thus ``next()`` is
not referentially transparent.

Another example of a kinds of function that **isn't** referentially
transparent is any function that, for instance, reads data from a file or from
the user. For instance, Python has a function called ``raw_input(prompt)``
that prints the string ``prompt`` and the copies whatever string the user
types next.

Type Conversions
----------------

Converting between data types in programming is an important operation, and it
is useful to distinguish between, at least, two kinds of conversions among
numeric types. 

A **narrowing conversion** is when you convert one type to a "smaller" type.
For instance, consider this code::

  var x float64 = 8
  fmt.Println(float32(x))  // narrowing conversion; prints 8

``float32(x)`` is a narrowing conversion because it is taking a 64-bit value
and stuffing it into a 32-bit value. So 32-bits are lost. In this case, all
the lost bits are 0s, and so there is no problem, but in general converting
from 64-bits to 32-bits loses information and so is unsafe.

A **widening conversion** is when you convert from one type to a "bigger"
type. For example::

  var y float32 = 8
  fmt.Println(float64(y))  // widening conversion; prints 8

``float64(y)`` is a widening conversion, because it goes from a 32-bit value to
a 64-bit one. No bits are lost, and so, for floats, widening conversions don't
change the converted value and so are safe.


Type Coercion
-------------

Most programming language have special coercion rules that implicitly convert
the types of data in some (but usually not all) circumstances. This is
typically done as a convenience for the programmer, as constantly having to
specify explicit conversions is tedious and results in code that is hard to
read.

For instance, many languages will let you add an integer to a floating point
number without complaint, e.g. ``5 + 3.2`` evaluates to ``8.2``.

Go_ has interesting approach to this problem. Consider this code::

  fmt.Println(5 + 3.2)   // Go, prints 8.2

5 and 3.2 are different types, and there is no ``+`` operator that works for
them. So Go_ automatically converts 5 to a float and returns a float result.

However, this not allowed in Go_::

  var x int = 5        // Go
  var y float32 = 3.2
  fmt.Println(x + y) // compile-time type error: x and y are different types

This fails because the type coercion rules in Go_ only apply to constants
(like 3.2), and not to variables (like ``x``).


Short Circuit Evaluation of Boolean Operators
---------------------------------------------

An important detail in the evaluation of boolean expressions in many languages
is that they used so-called **short-circuit evaluation** rules. For example,
consider the expression ``(x < 0) or (x > 1)``. From the rules of
propositional logic, we know it is true just when one, or both, of ``x < 0``
and ``x > 1`` are true. If we actually want to calculate this (as we do in a
programming language), then there is small trick: if ``x < 0`` is true then we
know the entire expression is true. There is no need to evaluate ``x > 1``
because whether it is true or false, the entire expression is still true.

Similarly, for the expression ``(x >= 0) and (x <= 1)``, if ``x >= 0`` happens
to be false, then there is no need to evaluate ``x <= 1`` because, regardless
of its value, the entire expression must be false.

Most modern languages use short-circuit evaluation, as it is a small and
simple optimization. It also allows us to write expressions like this::

  if i >= 0 and i < len(lst) and isDigit(lst[i]):   # Python
    # ... 

This if-condition first tests that ``i`` is in the proper range. If it's not,
then ``isDigit(lst[i])`` is *not* evaluated thanks to short-circuit
evaluation. If Python always did a full evaluation of the expression, then you
would get a run-time error in some cases.