Basic Type Checking =================== Type checking is concerned with ensuring that the operators and operands of an expression are of compatible types (this includes things like calling subprograms and assignment statements). For example:: string s = "hello!"; // C++ double x = 4.3; s = x; // type error caught by compiler // can't assign a string to an int The term compatible type is important, because languages have situations in which different types can be used together if they were the same type. For example:: double a = 2.2; int x = 5; a = x; // okay: no error because an int can be assigned to a double In this case, the value of ``x`` is converted, or coerced, into an equivalent value for type ``double``. For instance:: cout << 2 + 3.3; // 2 is coerced to 2.0 Here, the ``+`` is floating-point addition (not integer addition). Static and Dynamic Type Checking -------------------------------- Type checking can be done at compile-time or run-time. Compile-time type checking is called **static type checking**, and type errors result in compiler errors. Run-time type checking is called **dynamic type-checking**, and in this case errors result in run-time errors. For instance, consider this Java code:: abstract class Pet { abstract void feed(); } class Dog extends Pet { void feed() { println("eats kibble"); } } class Cat extends Pet { void feed() { println("eats a sardine"); } } void takeCareOf(Pet p) { println("takeCareOf called"); Dog d = (Dog) p; // this cast may fail at run-time // ... } void mainLoop() { takeCareOf(new Dog()); // okay takeCareOf(new Cat()); // oops } This code compiles even though it contains a type error. The statement ``takeCareOf(new Cat())`` fails because ``takeCareOf`` expects a ``Dog``. In languages with dynamic type binding, some type checking can only be done at compile time because at compile-time not enough information may be known about types to detect errors. Some languages, such as JavaScript and PHP, only provide dynamic type-checking. Sometimes the term **strongly typed** is used, although it is often vague and probably not a very useful term. Generally, a language is strongly-typed if it catches all type errors. In this strict sense, most languages are not strongly-type because there are usually ways to get around the type system and so prevent the compiler from catching a type error. While languages like C and C++ area notorious for allowing programmers to subvert their type-system, even a language such as Ada, which is meant to strongly-typed, has a way to allow programmers to purposely get around its type-checking rules. Type Equivalence ---------------- When are two types the same? Obviously, they are the same if they have the same name, e.g. the type ``string`` is equivalent to the type ``string``, and different if they have different names, e.g. ``int`` and ``float`` are different types. There are two main kinds of type equivalence: **name equivalence** and **structural equivalence**. Name equivalence is simple: two types are considered equivalent if they have the same name. Structural equivalence is more sophisticated: two type are considered equivalent if they share the same structure. Many languages use a mix of both name equivalence and structural equivalence. Ada is an example of a language that uses strict name equivalence, e.g.:: type Indextype is 1..100; -- Ada count : Integer; index : Indextype; The types ``Integer`` and ``Indextype`` are *not* equivalent, and so you cannot store an ``Integer`` in an ``Indextype`` variable, and vice-versa. However, at the implementation level, it may well be the case that both types have identical implementations (e.g. 32 bit ints, perhaps). For example, Go_ uses name equivalence:: type Age int type Num int func main() { var a Age = 3 var n Num = 4 var i int = 5 //a = n // type error //n = a // type error //a = i // type error //i = n // type error a = 5 // ok! //n = i + 5 // type error //fmt.Println(a + n) // type error //fmt.Println(i + n) // type error fmt.Println(n + 2) // ok! //fmt.Println(n + 2.1) // error fmt.Println(a) fmt.Println(n) fmt.Println(i) } The types ``Age`` and ``Num`` are *implemented* exactly the same as an ``int``. However, they are different types, and so Go_ usually gives type errors if you try to mix them together. As the examples show, in some cases with constant values Go_ will automatically convert a type to allow an operation. Many languages let you create types without simple names. For example, int C++ you can could write this statement:: vector x; The type ``vector`` is not a simple name, and strict name equivalent rules would force it to have a name, e.g.:: typedef vector StringSeq; using StringSeq = vector; // C++11 alternative to typedef This is pretty inconvenient, and so most languages do not use strict name equivalence for types. Structural type equivalence declares types are equivalent when the two types have the same structure. For instance, in Go_ we could write this:: type Person struct { // Go_ name string age int } type Cheese struct { name string age int } Under purely structural equivalence rules, ``Person`` and ``Cheese`` would be considered equivalent. But with name equivalence rules they would be different. What rules does Go_ use? Consider this function:: func display(p Person) { fmt.Println("name: " + p.name) fmt.Println(" age:", p.age) } Clearly we can pass a ``Person`` object to ``display``. But can we pass it a ``Cheese``? No --- a compiler type error results in this code:: thing := Cheese{"hat", 10} display(thing) // compiler error: thing type is not Person This shows that Go_ does *not* use purely structural type equivalence rules. C-like languages give the same result as Go_. Structural equivalence does appear in C-like languages. Consider:: typedef int Age; // C/C++ typedef int Date; Age a = 4; Date d = 23; a = d; // okay d = a; // okay The types ``Age`` and ``Date`` have different names, but they are structurally identical and so can be used interchangeably. Indeed, ``typedef`` just creates aliases, i.e. alternative names for existing types. Both ``Age`` and ``Date`` are really just the type ``int``, and so a ``typedef`` is mainly for making code easier for humans to read. Arithmetic ---------- Arithmetic operators have three main arities: - **unary**, e.g. in the expression ``-(1 + 2)``, ``-`` is an unary operator because it takes a single input (note also that ``-`` has a binary version that takes two operands, e.g. ``4 - 5``) - **binary**, e.g. in the expression ``-(1 + 2)``, ``+`` is binary operator because it takes two inputs - **ternary**, e.g. in the C/C++ expression ``(x > 5) ? 1 : 0``, ``? :`` is a ternary operator because it takes exactly three inputs. Arithmetic expressions can also vary in the order in which operators and value are placed: - **infix**, e.g. in the expression ``1 + 2``, ``+`` is an infix operator - **prefix**, e.g. in the expression ``+ 1 2``, ``+`` is a prefix expression; Lisp uses prefix notation for expressions - **postfix**, e.g. in the expression ``1 2 +``, ``+`` is a postfix operator; FORTH, and other stack-based languages, use postfix operators We also need to worry about operator precedence, e.g.: - In C-like languages, in the expression ``1 + 2 * 4`` ``*`` is done before ``+``. But in the expression ``(1 + 2) * 4``, ``+`` is done before ``*``. Note that in Lisp, the order of operations is always explicit, e.g. ``(+ (* 2 4))`` and ``(* (+ 1 2))``. The SmallTalk language is one of the few languages that don't follow the traditional operator precedence rules of arithmetic. In SmallTalk, an expression like ``2 + 3 * 5`` evaluates to 25 since the operators are evaluated in the order they occur. This is a consistent use of SmallTalk's function-calling rules, but for many programmers it is a source of errors and annoyance. - In an expression like ``2 * 3 * 0``, a language must decide which ``*`` is evaluated first, i.e. is the expression that same as ``(2 * 3) * 0`` or ``2 * (3 * 0)``. In mathematics, there is no difference, but it could matter in computer arithmetic where expressions might over-flow or underflow For example, suppose we are doing 8-bit integer arithmetic where the maximum value is 127. Then consider these two expressions:: (-100 + 100) + 100 // equals 100, no overflow -100 + (100 + 100) // equals 100, overflow Both expressions have the same value in the mathematical sense, and, indeed, in Go_ (and many other languages) they evaluate to the same thing. But there's an important difference: the second expression has an *overflow* because ``100 + 100`` results in a value bigger than 127. But there is no overflow in the first expression. Operand Evaluation Order ------------------------ Consider this Go_ code:: func one() int { fmt.Println("one") return 1 } func two() int { fmt.Println("two") return 2 } func add(x, y int) int { return x + y } func main() { a := add(one(), two()) // Is one() evaluated before two()? fmt.Println(a) } `The Go spec `_ promises that in the function call ``add(one(), two())``, ``one()`` will be evaluated before ``two()``. Function Side Effects --------------------- A function is said to cause a **side-effect** if it modifies some value outside of the function. For example:: func one() int { fmt.Println("one") // side effect return 1 } Printing is a very useful side effect, but often side effects cause confusing behaviour. For example:: var a int = 5 func f(n int) int { a++ // side effect return n + 1 } func main() { fmt.Println(a + f(2)) // 9 fmt.Println(a + f(2)) // 10 } This shows that a simple side-effect in ``f`` causes the ``a + f(2)`` to evaluate differently depending on how many times ``f`` is called. This can make it very hard to reason about the behaviour of expressions. Traditional mathematical functions do not have side-effects, and so don't suffer from this kind of problem. One of the tenets of functional programming is that functions should not have side-effects precisely to avoid this sort of weird behaviour. The problem with this, though, is that it also erases the benefit of useful side-effects like printing to the screen, reading and writing files, or accessing the web. Referential Transparency ------------------------ A functions is said to be **referentially transparent** if the function always returns the same output when given the same input. For instance:: func twice(x int) int { // a referentially transparent Go function return 2 * x } The ``twice`` function is referentially transparent because if you pass it ``a``, it always return ``a + 1``. It's value depends only on the input to the function any nothing else. Here's a function that's not referentially transparent:: var count int func next() int { // a Go function that is not referentially transparent count++ return count } Every time you call ``next()``, you get a different result. Thus ``next()`` is not referentially transparent. Another example of a kinds of function that **isn't** referentially transparent is any function that, for instance, reads data from a file or from the user. For instance, Python has a function called ``raw_input(prompt)`` that prints the string ``prompt`` and the copies whatever string the user types next. Type Conversions ---------------- Converting between data types in programming is an important operation, and it is useful to distinguish between, at least, two kinds of conversions among numeric types. A **narrowing conversion** is when you convert one type to a "smaller" type. For instance, consider this code:: var x float64 = 8 fmt.Println(float32(x)) // narrowing conversion; prints 8 ``float32(x)`` is a narrowing conversion because it is taking a 64-bit value and stuffing it into a 32-bit value. So 32-bits are lost. In this case, all the lost bits are 0s, and so there is no problem, but in general converting from 64-bits to 32-bits loses information and so is unsafe. A **widening conversion** is when you convert from one type to a "bigger" type. For example:: var y float32 = 8 fmt.Println(float64(y)) // widening conversion; prints 8 ``float64(y)`` is a widening conversion, because it goes from a 32-bit value to a 64-bit one. No bits are lost, and so, for floats, widening conversions don't change the converted value and so are safe. Type Coercion ------------- Most programming language have special coercion rules that implicitly convert the types of data in some (but usually not all) circumstances. This is typically done as a convenience for the programmer, as constantly having to specify explicit conversions is tedious and results in code that is hard to read. For instance, many languages will let you add an integer to a floating point number without complaint, e.g. ``5 + 3.2`` evaluates to ``8.2``. Go_ has interesting approach to this problem. Consider this code:: fmt.Println(5 + 3.2) // Go, prints 8.2 5 and 3.2 are different types, and there is no ``+`` operator that works for them. So Go_ automatically converts 5 to a float and returns a float result. However, this not allowed in Go_:: var x int = 5 // Go var y float32 = 3.2 fmt.Println(x + y) // compile-time type error: x and y are different types This fails because the type coercion rules in Go_ only apply to constants (like 3.2), and not to variables (like ``x``). Short Circuit Evaluation of Boolean Operators --------------------------------------------- An important detail in the evaluation of boolean expressions in many languages is that they used so-called **short-circuit evaluation** rules. For example, consider the expression ``(x < 0) or (x > 1)``. From the rules of propositional logic, we know it is true just when one, or both, of ``x < 0`` and ``x > 1`` are true. If we actually want to calculate this (as we do in a programming language), then there is small trick: if ``x < 0`` is true then we know the entire expression is true. There is no need to evaluate ``x > 1`` because whether it is true or false, the entire expression is still true. Similarly, for the expression ``(x >= 0) and (x <= 1)``, if ``x >= 0`` happens to be false, then there is no need to evaluate ``x <= 1`` because, regardless of its value, the entire expression must be false. Most modern languages use short-circuit evaluation, as it is a small and simple optimization. It also allows us to write expressions like this:: if i >= 0 and i < len(lst) and isDigit(lst[i]): # Python # ... This if-condition first tests that ``i`` is in the proper range. If it's not, then ``isDigit(lst[i])`` is *not* evaluated thanks to short-circuit evaluation. If Python always did a full evaluation of the expression, then you would get a run-time error in some cases.