Issues in Data Typing

Robert D. Cameron
Feb. 2002

Orthogonality

Data structuring has two independent dimensions:

  1. the form of the structure (e.g., RECORD, ARRAY or SET), and
  2. the substructures allowed as components.

Orthogonal design:

Nonorthogonality:

Security of Data Structure Construction

Under sequential construction (third generation languages), data structure components (fields of a record, or elements of an array) are filled in one at a time.

What if the programmer forgets to fill in one field or element? For example,

TYPE ABCD = RECORD  a, b : INTEGER; c, d : REAL;

PROCEDURE MakeABCD(VAR NewABCD : ABCD);
BEGIN
  NewABCD.a := 0;
  NewABCD.b := 1;
  NewABCD.c := 2.4
END MakeABCD;

This is a common type of programming error which is not caught by the compiler and may be difficult to debug.

Under parallel construction (e.g., Ada, Modula-3, functional languages), such structures may created by simultaneous assignment to all components. The compiler will catch any such errors, e.g.,

PROCEDURE MakeABCD(VAR NewABCD : ABCD);
BEGIN
  NewABCD := ABCD(a := 0; b := 1; c := 2.4)
END MakeABCD;

Better security.

Name vs. Structural Equivalence of Types: Name Equivalence is More Secure

When do two variables have compatible types?

Name Equivalence:
If they have the same type name.

Structural Equivalence:
If they have the same type structure.

Example:

TYPE PersonId = ARRAY[1..10] OF CHAR;
     VehicleId = ARRAY[1..10] OF CHAR;

PROCEDURE ProcessVehicle(v1 : VehicleId); ...

VAR p1 : PersonId;

ProcessVehicle(p1)

This last call is legal by structural equivalence, but illegal by name equivalence.

Name equivalence provides better security. However, this kind of accidental compatibility is very rare, so the security is only slightly better.

A minor point: name equivalence is also easier to implement in compilers; structural equivalence requires data structure comparisons which may involve the recursive comparison of substructures.

Structural Equivalence is More Flexible

Name equivalence can cause some annoying problems, especially in combining facilities provided by different library modules.

Example (Modula-3 syntax):

INTERFACE Lib1;
  TYPE CharSet = SET OF CHAR;
  PROCEDURE PrintChars(x : CharSet);
END Lib.

INTERFACE Lib2;
  TYPE SetOfChar = SET OF CHAR;
  PROCEDURE CharsOfString(str : Text.T) : SetOfChar;
END Lib2.

MODULE Client;
IMPORT Lib1, Lib2;
BEGIN
  Lib1.PrintChars(Lib2.CharsOfString("abracadabra"))
END Client.

Under name equivalence, the last statement is illegal because the type returned by CharsOfString is Lib2.SetOfChar while the type required by PrintChars has a different name, i.e., Lib1.CharSet. (Even if the name in Lib2 was CharSet, the names are still considered different because they are in different scopes.)

Under structural equivalence (e.g., Modula-3), this example is allowed.

Choices in language design:

Pascal:
mostly name equivalence, but structural equivalence in some cases (e.g., SETs).
Algol 68:
structural equivalence.
Ada:
name equivalence.
Modula-3:
structural equivalence for concrete types, name equivalence for abstract data types.

Answering the Security Issue.

Name equivalence provides slightly better security by partly hiding the type of a data structure (i.e., hiding it for type compatibility).

Even better security can be had by fully hiding the type structure using abstract data types (e.g., Modula's opaque types).

Therefore, types should be declared either visibly if flexibility is needed (structural equivalence), or opaquely if security is needed (name equivalence).