Type Conversions

Robert D. Cameron
Feb. 6, 2002

Type conversions allow expressions of an actual type T_a to be used in the context of an expected type T_e. Of course, programmers can simply write their own type conversion routines as ordinary functions. But languages often provide special support for type conversions of various kinds.

There are several issues involved in language support for type conversions.

Conversion Notation: Implicit, Particular Functions, Casts.
Coercions and Typing Rules
Overloaded Conversions.
Transitivity of Conversions.
Semantics of Conversion: Lossless and Lossy
Unsafe Conversions.
Uniformity of Conversion Rules.

Conversion Notation

Languages may provide three different types of notation for conversions.

Implicit.: In this case, no actual notation is required to specify the conversion: conversions are automatically supplied by the language processor. This type of conversion is often called coercion.
Particular Functions.: Languages may provided built-in functions for particular special cases of conversion. For example: TRUNC and ROUND functions for two different conversions from REAL to INTEGER.
Casts.: A cast is specified by naming the type to which the transfer takes place. Functional notation may be used or the language may have a special notation for casts, as in C and Java. Examples: INTEGER(3.78) in Modula-2, (short) 5 in Java.

Coercions and Typing Rules

Coercions interact with typing rules. When a type mismatch occurs between the actual type T_a of expression occuring in a context which expects a type T_e, a type checker would normally report an error. However, if the language defines a coercion in this case, then that coercion is applied instead of reporting the error.

Casts May Represent Overloaded Conversions

When a language supports conversions using type casts, this effectively allows a family of overloaded conversion functions to be represented. For example, the Java cast short (expr) specifies six different conversions from each of the primitive types byte, char, int, long, float, and double.

Transitivity of Conversions

It may be that there is a language provides no standard conversion in a particular case for an actual type T_a in the context of an expected type T_e. But suppose that there are is an intermediate type T_i for which there exist conversions from T_a to T_i and from T_i to T_e. Then these two conversions may be used transitively to achieve the T_a to T_e conversion.

Algol 68 is a language with implicit transitive conversions. This is probably an unsafe feature: actual type errors might go undetected if the compiler can find a series of conversions that resolve the type conflict.

Some conversions may be defined transitively. Java's float to char conversion is defined as a float to int conversion followed by a int-to-char conversion.

Cast notation may be used for explicit transitive conversions. For example, consider (int) (char) Float.POSITIVE_INFINITY versus (int) Float.POSITIVE_INFINITY. The latter conversion narrows the largest representable floating point value to the largest 32-bit two's complement integer value 2147483647. That conversion is also the first step in the float-to-char conversion; the second step is to discard the high-order 16 bits to get the Unicode 16-bit character '\uffff'. Converting from this value to integer yields 65535.

Semantics of Conversions

Given a syntactic form specifying a conversion from one type to another, the semantics of the conversion are rules defining the actual mappings from values of the input type to values of the output type.

Lossless versus Lossy Conversions

Lossless conversions preserve the information contained in a value.

Integer to String
Floating point to Double precision
Character to integer

Lossy conversions may lose information.

Integer to floating point
Integer to character

Unsafe Conversions

Unsafe conversions are ones in which conversions are not performed according to the semantics of the types involved, but simply by reinterpreting the bit patterns. For example, Modula-2 defines its type transfer functions (casts in a functional notation) as low-level unsafe conversions.

Unsafe conversions provide efficient ways to reinterpret bit patterns. The disadvantages include the introduction of a loophole into the security system of a strongly-typed language and the likelihood that programs will not be portable accross systems that implement the types using different representations.

Unsafe conversions can also occur through the variant record mechanisms of Pascal or Modula-2. This is problably worse than the use of explicit unsafe type transfers, because conversion in this manner may be accidental, i.e., an uncaught programmer error. Ada's variant record mechanism has been carefully designed to disallow this safety problem.

Modula-3 provides for unsafe conversions with its LOOPHOLE construct. However, LOOPHOLE and other unsafe features may only be used in explicitly declared UNSAFE modules. Ordinary, safe modules may not use any unsafe features, nor may they import UNSAFE modules unless those modules have a safe interface. This can provide for a nice division of a complex system into a large number of safe high-level modules supported by a few low-level modules using unsafe techniques for efficiency. If an inexplicable run-time error occurs, responsibility for the error is localized to the UNSAFE modules.