Basic Type Checking¶
Type checking is concerned with ensuring that the operators and operands of an expression are of compatible types (this includes things like calling subprograms and assignment statements).
For example:
string s = "hello!"; // C++
double x = 4.3;
s = x; // type error caught by compiler
// can't assign a string to an int
The term compatible type is important, because languages have situations in which different types can be used together if they were the same type. For example:
double a = 2.2;
int x = 5;
a = x; // okay: no error because an int can be assigned to a double
In this case, the value of x
is converted, or coerced, into an equivalent
value for type double
. For instance:
cout << 2 + 3.3; // 2 is coerced to 2.0
Here, the +
is floating-point addition (not integer addition).
Static and Dynamic Type Checking¶
Type checking can be done at compile-time or run-time. Compile-time type checking is called static type checking, and type errors result in compiler errors. Run-time type checking is called dynamic type-checking, and in this case errors result in run-time errors.
For instance, consider this Java code:
abstract class Pet {
abstract void feed();
}
class Dog extends Pet {
void feed() {
println("eats kibble");
}
}
class Cat extends Pet {
void feed() {
println("eats a sardine");
}
}
void takeCareOf(Pet p) {
println("takeCareOf called");
Dog d = (Dog) p; // this cast may fail at run-time
// ...
}
void mainLoop() {
takeCareOf(new Dog()); // okay
takeCareOf(new Cat()); // oops
}
This code compiles even though it contains a type error. The statement
takeCareOf(new Cat())
fails because takeCareOf
expects a Dog
.
In languages with dynamic type binding, some type checking can only be done at compile time because at compile-time not enough information may be known about types to detect errors. Some languages, such as JavaScript and PHP, only provide dynamic type-checking.
Sometimes the term strongly typed is used, although it is often vague and probably not a very useful term. Generally, a language is strongly-typed if it catches all type errors. In this strict sense, most languages are not strongly-type because there are usually ways to get around the type system and so prevent the compiler from catching a type error. While languages like C and C++ area notorious for allowing programmers to subvert their type-system, even a language such as Ada, which is meant to strongly-typed, has a way to allow programmers to purposely get around its type-checking rules.
Type Equivalence¶
When are two types the same? Obviously, they are the same if they have the
same name, e.g. the type string
is equivalent to the type string
, and
different if they have different names, e.g. int
and float
are
different types.
There are two main kinds of type equivalence: name equivalence and structural equivalence.
Name equivalence is simple: two types are considered equivalent if they have the same name. Structural equivalence is more sophisticated: two type are considered equivalent if they share the same structure. Many languages use a mix of both name equivalence and structural equivalence.
Ada is an example of a language that uses strict name equivalence, e.g.:
type Indextype is 1..100; -- Ada
count : Integer;
index : Indextype;
The types Integer
and Indextype
are not equivalent, and so you
cannot store an Integer
in an Indextype
variable, and vice-versa.
However, at the implementation level, it may well be the case that both types
have identical implementations (e.g. 32 bit ints, perhaps).
For example, Go uses name equivalence:
type Age int
type Num int
func main() {
var a Age = 3
var n Num = 4
var i int = 5
//a = n // type error
//n = a // type error
//a = i // type error
//i = n // type error
a = 5 // ok!
//n = i + 5 // type error
//fmt.Println(a + n) // type error
//fmt.Println(i + n) // type error
fmt.Println(n + 2) // ok!
//fmt.Println(n + 2.1) // error
fmt.Println(a)
fmt.Println(n)
fmt.Println(i)
}
The types Age
and Num
are implemented exactly the same as an
int
. However, they are different types, and so Go usually gives type
errors if you try to mix them together. As the examples show, in some cases
with constant values Go will automatically convert a type to allow an
operation.
Many languages let you create types without simple names. For example, int C++ you can could write this statement:
vector<string*> x;
The type vector<string*>
is not a simple name, and strict name equivalent
rules would force it to have a name, e.g.:
typedef vector<string*> StringSeq;
using StringSeq = vector<string*>; // C++11 alternative to typedef
This is pretty inconvenient, and so most languages do not use strict name equivalence for types.
Structural type equivalence declares types are equivalent when the two types have the same structure. For instance, in Go we could write this:
type Person struct { // Go_
name string
age int
}
type Cheese struct {
name string
age int
}
Under purely structural equivalence rules, Person
and Cheese
would be
considered equivalent. But with name equivalence rules they would be
different.
What rules does Go use? Consider this function:
func display(p Person) {
fmt.Println("name: " + p.name)
fmt.Println(" age:", p.age)
}
Clearly we can pass a Person
object to display
. But can we pass it a
Cheese
? No — a compiler type error results in this code:
thing := Cheese{"hat", 10}
display(thing) // compiler error: thing type is not Person
This shows that Go does not use purely structural type equivalence rules. C-like languages give the same result as Go.
Structural equivalence does appear in C-like languages. Consider:
typedef int Age; // C/C++
typedef int Date;
Age a = 4;
Date d = 23;
a = d; // okay
d = a; // okay
The types Age
and Date
have different names, but they are structurally
identical and so can be used interchangeably. Indeed, typedef
just creates
aliases, i.e. alternative names for existing types. Both Age
and Date
are really just the type int
, and so a typedef
is mainly for making
code easier for humans to read.
Arithmetic¶
Arithmetic operators have three main arities:
- unary, e.g. in the expression
-(1 + 2)
,-
is an unary operator because it takes a single input (note also that-
has a binary version that takes two operands, e.g.4 - 5
) - binary, e.g. in the expression
-(1 + 2)
,+
is binary operator because it takes two inputs - ternary, e.g. in the C/C++ expression
(x > 5) ? 1 : 0
,? :
is a ternary operator because it takes exactly three inputs.
Arithmetic expressions can also vary in the order in which operators and value are placed:
- infix, e.g. in the expression
1 + 2
,+
is an infix operator - prefix, e.g. in the expression
+ 1 2
,+
is a prefix expression; Lisp uses prefix notation for expressions - postfix, e.g. in the expression
1 2 +
,+
is a postfix operator; FORTH, and other stack-based languages, use postfix operators
We also need to worry about operator precedence, e.g.:
In C-like languages, in the expression
1 + 2 * 4
*
is done before+
. But in the expression(1 + 2) * 4
,+
is done before*
.Note that in Lisp, the order of operations is always explicit, e.g.
(+ (* 2 4))
and(* (+ 1 2))
.The SmallTalk language is one of the few languages that don’t follow the traditional operator precedence rules of arithmetic. In SmallTalk, an expression like
2 + 3 * 5
evaluates to 25 since the operators are evaluated in the order they occur. This is a consistent use of SmallTalk’s function-calling rules, but for many programmers it is a source of errors and annoyance.In an expression like
2 * 3 * 0
, a language must decide which*
is evaluated first, i.e. is the expression that same as(2 * 3) * 0
or2 * (3 * 0)
. In mathematics, there is no difference, but it could matter in computer arithmetic where expressions might over-flow or underflowFor example, suppose we are doing 8-bit integer arithmetic where the maximum value is 127. Then consider these two expressions:
(-100 + 100) + 100 // equals 100, no overflow -100 + (100 + 100) // equals 100, overflow
Both expressions have the same value in the mathematical sense, and, indeed, in Go (and many other languages) they evaluate to the same thing. But there’s an important difference: the second expression has an overflow because
100 + 100
results in a value bigger than 127. But there is no overflow in the first expression.
Operand Evaluation Order¶
Consider this Go code:
func one() int {
fmt.Println("one")
return 1
}
func two() int {
fmt.Println("two")
return 2
}
func add(x, y int) int {
return x + y
}
func main() {
a := add(one(), two()) // Is one() evaluated before two()?
fmt.Println(a)
}
The Go spec promises that
in the function call add(one(), two())
, one()
will be evaluated before
two()
.
Function Side Effects¶
A function is said to cause a side-effect if it modifies some value outside of the function. For example:
func one() int {
fmt.Println("one") // side effect
return 1
}
Printing is a very useful side effect, but often side effects cause confusing behaviour. For example:
var a int = 5
func f(n int) int {
a++ // side effect
return n + 1
}
func main() {
fmt.Println(a + f(2)) // 9
fmt.Println(a + f(2)) // 10
}
This shows that a simple side-effect in f
causes the a + f(2)
to
evaluate differently depending on how many times f
is called. This can
make it very hard to reason about the behaviour of expressions.
Traditional mathematical functions do not have side-effects, and so don’t suffer from this kind of problem. One of the tenets of functional programming is that functions should not have side-effects precisely to avoid this sort of weird behaviour. The problem with this, though, is that it also erases the benefit of useful side-effects like printing to the screen, reading and writing files, or accessing the web.
Referential Transparency¶
A functions is said to be referentially transparent if the function always returns the same output when given the same input. For instance:
func twice(x int) int { // a referentially transparent Go function
return 2 * x
}
The twice
function is referentially transparent because if you pass it
a
, it always return a + 1
. It’s value depends only on the input to the
function any nothing else.
Here’s a function that’s not referentially transparent:
var count int
func next() int { // a Go function that is not referentially transparent
count++
return count
}
Every time you call next()
, you get a different result. Thus next()
is
not referentially transparent.
Another example of a kinds of function that isn’t referentially
transparent is any function that, for instance, reads data from a file or from
the user. For instance, Python has a function called raw_input(prompt)
that prints the string prompt
and the copies whatever string the user
types next.
Type Conversions¶
Converting between data types in programming is an important operation, and it is useful to distinguish between, at least, two kinds of conversions among numeric types.
A narrowing conversion is when you convert one type to a “smaller” type. For instance, consider this code:
var x float64 = 8
fmt.Println(float32(x)) // narrowing conversion; prints 8
float32(x)
is a narrowing conversion because it is taking a 64-bit value
and stuffing it into a 32-bit value. So 32-bits are lost. In this case, all
the lost bits are 0s, and so there is no problem, but in general converting
from 64-bits to 32-bits loses information and so is unsafe.
A widening conversion is when you convert from one type to a “bigger” type. For example:
var y float32 = 8
fmt.Println(float64(y)) // widening conversion; prints 8
float64(y)
is a widening conversion, because it goes from a 32-bit value to
a 64-bit one. No bits are lost, and so, for floats, widening conversions don’t
change the converted value and so are safe.
Type Coercion¶
Most programming language have special coercion rules that implicitly convert the types of data in some (but usually not all) circumstances. This is typically done as a convenience for the programmer, as constantly having to specify explicit conversions is tedious and results in code that is hard to read.
For instance, many languages will let you add an integer to a floating point
number without complaint, e.g. 5 + 3.2
evaluates to 8.2
.
Go has interesting approach to this problem. Consider this code:
fmt.Println(5 + 3.2) // Go, prints 8.2
5 and 3.2 are different types, and there is no +
operator that works for
them. So Go automatically converts 5 to a float and returns a float result.
However, this not allowed in Go:
var x int = 5 // Go
var y float32 = 3.2
fmt.Println(x + y) // compile-time type error: x and y are different types
This fails because the type coercion rules in Go only apply to constants
(like 3.2), and not to variables (like x
).
Short Circuit Evaluation of Boolean Operators¶
An important detail in the evaluation of boolean expressions in many languages
is that they used so-called short-circuit evaluation rules. For example,
consider the expression (x < 0) or (x > 1)
. From the rules of
propositional logic, we know it is true just when one, or both, of x < 0
and x > 1
are true. If we actually want to calculate this (as we do in a
programming language), then there is small trick: if x < 0
is true then we
know the entire expression is true. There is no need to evaluate x > 1
because whether it is true or false, the entire expression is still true.
Similarly, for the expression (x >= 0) and (x <= 1)
, if x >= 0
happens
to be false, then there is no need to evaluate x <= 1
because, regardless
of its value, the entire expression must be false.
Most modern languages use short-circuit evaluation, as it is a small and simple optimization. It also allows us to write expressions like this:
if i >= 0 and i < len(lst) and isDigit(lst[i]): # Python
# ...
This if-condition first tests that i
is in the proper range. If it’s not,
then isDigit(lst[i])
is not evaluated thanks to short-circuit
evaluation. If Python always did a full evaluation of the expression, then you
would get a run-time error in some cases.