Review: C/C++ Data Types

Basic Data Types

C/C++ provides many low-level data types for representing numbers, e.g.

void data_types() {
    cout << "sizeof(bool) = " << sizeof(bool) << "\n"
         << "sizeof(char) = " << sizeof(char) << "\n"
         << "sizeof(short) = " << sizeof(short) << "\n"
         << "sizeof(int) = " << sizeof(int) << "\n"
         << "sizeof(long) = " << sizeof(long) << "\n"
         << "sizeof(long long) = " << sizeof(long long) << "\n"
         << "\n"
         << "sizeof(float) = " << sizeof(float) << "\n"
         << "sizeof(double) = " << sizeof(double) << "\n"
         << "sizeof(long double) = " << sizeof(long double) << "\n"
         << "\n"
         << "sizeof(unsigned char) = " << sizeof(unsigned char) << "\n"
         << "sizeof(unsigned short) = " << sizeof(unsigned short) << "\n"
         << "sizeof(unsigned int) = " << sizeof(unsigned int) << "\n"
         << "sizeof(unsigned long) = " << sizeof(unsigned long) << "\n"
         << "sizeof(unsigned long long) = "
                                << sizeof(unsigned long long) << "\n";
}

the sizeof function returns the size, in bytes, of the object representation of the type

C/C++ does not specify the exact size of, say, an int, and so it can be different on different computers (or with different compiler options set)

this is a distinctive features of C/C++: an int can be set be defined to be a size that makes sense for your computer, e.g. perhaps a 32-bit int on a 32-bit computer, and a 64-bit int on a 64-bit computer

in C/C++, the size of a byte is guaranteed to be at least 8 bits, but it might be more, depending on the computer

other basic types have relative guarantees, e.g. a double is guaranteed to be either the same size as a float, or bigger, i.e. sizeof(double) >= sizeof(float)

so it’s possible in some C/C++ programs that float and double are the same size!

in this course, we’ll assume that we are using laptop/desktop computers with an 8-bit byte, and an int of at least 32 bits

Other Numeric Data Types

C/C++ has many other numeric data types!

for example:

signed char

char16_t  // at least 16 bits, and not smaller than a char
char32_t  // at least 32 bits, and not smaller than a char
wchar_t   // can represent largest supported character set

int8_t    // 8-bit int (exactly 8 bits)
int16_t   // 16-bit int (exactly 16 bits)
int32_t   // 32-bit int (exactly 32 bits)
int64_t   // 64-bit int (exactly 64 bits)

uint8_t    // 8-bit unsigned int (exactly 8 bits)
uint16_t   // 16-bit unsigned int (exactly 16 bits)
uint32_t   // 32-bit unsigned int (exactly 32 bits)
uint64_t   // 64-bit unsigned int (exactly 64 bits)

size_t  // implementation-defined unsigned integer used for
        // indexing container types, such as vector or string

all these different types may seem excessive, but it can be very convenient to have just the right type for your application

e.g. if you are writing a simulator for an 8-bit chip, then uint8_t would probably be an ideal data type for it

Signed vs Unsigned Integers

void sign_test() {
    int a = 0;
    unsigned int b = 0;
    cout << "a = " << a << "\n"
         << "b = " << b << "\n"
         << "a - 1 = " << a - 1 << "\n"
         << "b - 1 = " << b - 1 << "\n";
}

signed numbers allow for negative values, while unsigned are only non-negative

unsigned data types are useful in, for example, low-level programming, e.g. manipulating individual bits in memory

in this course, we will stick with signed numbers

Min and Max Values

#include <climits>
#include <cfloat>

void size_test() {
    cout << "CHAR_BIT = " << CHAR_BIT << " (# of bits in a byte)\n"
         << "CHAR_MIN = " << CHAR_MIN << "\n"
         << "CHAR_MAX = " << CHAR_MAX << "\n"
         << "INT_MIN = " << INT_MIN << "\n"
         << "INT_MAX = " << INT_MAX << "\n"
         << "\n"
         << "FLT_MIN = " << FLT_MIN << "\n"
         << "FLT_MAX = " << FLT_MAX << "\n"
         << "DBL_MIN = " << DBL_MIN << "\n"
         << "DBL_MAX = " << DBL_MAX << "\n"
         << "FLT_EPSILON = " << FLT_EPSILON << "\n"
         << "DBL_EPSILON = " << DBL_EPSILON << "\n";
}

notice that floating point types have an epsilon value, which is the smallest positive value they can represent

numbers smaller than the type’s epsilon are 0

Literals and Variables

values like -65, 437.33, true, 'c', and "c" are all examples of literals

-65 is an int literal

437.33 is a double literal

true is a bool literal

'c' is a char literal

"c" is a string literal

variables are named values, e.g.

int n = -65;
double x = 437.33;
bool flag = true;
char c = 'c';
string s = "c";

informally, we refer to both n and -65 as int values

but more precisely, n is an int variable, and -65 is an int literal

C++ lets you declare const variables, i.e. variables whose values can be read but not written:

const int START_YEAR = 2000;

int next_year = START_YEAR + 5;  // ok: START_YEAR is read

cout << START_YEAR;              // ok: START_YEAR is read

START_YEAR++;                    // compile-time error: can't
                                 // change the value of a const

Assignment Statements

assignment states have this general form:

left = right;

left is typically a variable, and is known as the left-value, or l-value of the assignment

right is some expression that evaluates to a value that can be assigned to left, and is known as the right-value, or r-value of the assignment

note that a literal cannot be an l-value, i.e.:

int n = 5;  // ok

4 = n;      // compiler error: can't assign a value to 4

Structures

a struct is a collection of named values

e.g.

struct Point {
    double x;
    double y;
};

void point_test() {
    Point a;
    Point b{4, -5.5};
    Point c;

    c.x = 2;
    c.y = 3;

    cout << a.x << " " << a.y << "\n"
         << b.x << " " << b.y << "\n"
         << c.x << " " << c.y << "\n"
    ;
}

note how b is initialized using {}-notation

{}-notation is new in C++11, and is a very convenient way to initialize a struct

individual data values in a struct are accessing using dot notation, i.e. b.x is the x value of b, and b.y is its y value

Pointers

pointers are one of the most powerful and useful (and dangerous!) features of C/C++

diagrams are good way to think about pointers

double x = 2.6;


       7500  7564  7628  7692
   ---+-----+-----+-----+-----+---
...   |     | 2.6 |     |     |   ...
   ---+-----+-----+-----+-----+---
               x

     computer RAM (main memory)

the compiler and operating system determine the exact location of variables

where a variable is actually stored in memory varies from run to run of a program

double x = 2.6;

double* xp = &x;


       7500  7564        8844
   ---+-----+-----+-   -+-----+---
...   |     | 2.6 | ... |7564 |   ...
   ---+-----+-----+-   -+-----+---
               x          xp
             double      double*
              *xp

pointers let us easily access the values they refer to

so both x and *xp access the same location

since we don’t usually know the exact memory addresses, we usually simplify the diagrams

double x = 2.6;

double* xp = &x;

   double
  +-----+
x | 2.6 |
  +-----+
     ^
     |
     |
     |
     xp

Pointers to structs

you can make pointers to any data type in C++, even structs

for example:

struct Point {
    double x;
    double y;
};

Point a{2, 3};

Point* p = &a;

cout << (*p).x
     << (*p).y;

note that you use brackets to access the values in the struct p points to, i.e. (*p).x

*p.x doesn’t work because . has higher precedence than *, and so C++ first tries to evaluate p.x which is a type error

to avoid the annoyance of having to write brackets in this situation, C/C++ provides the -> operator:

cout << p->x    // shorthand for (*p).x
     << p->y;   // shorthand for (*p).y

pointer to structs are quite common in C/C++ program, so it is worthwhile to learn this notation!

Arrays

C-style arrays are a low-level way of dealing with contiguous chunks of memory

in C++, arrays are typically used as a way to implement other array-like things, like vectors or strings

int a[3] = { 3, -2, 0 };


       7500  7564  7628  7692
   ---+-----+-----+-----+-----+---
...   |  3  | -2  |  0  |     |   ...
   ---+-----+-----+-----+-----+---
         a    a+1   a+2

essentially, a is a pointer to the first element of the array

a + 1 points to the second element

a + 2 points to the third element

we can access array elements using * notation:

cout << *(a + 0)  // or just *a
     << *(a + 1)
     << *(a + 2);

or [] notation:

cout << a[0]
     << a[1]
     << a[2];

in general, a[i] is shorthand for *(a + i)

in our example, int is 64 bits, and so a + 1 is a memory location 64-bits over

but suppose we instead have an array of characters, where a char is (say) 8 bits:

char b[4] = {'c', 'a', 't', 's'};

       6100  6108  6116  6124
   ---+-----+-----+-----+-----+---
...   |  c  |  a  |  t  |  s  |   ...
   ---+-----+-----+-----+-----+---
         b    b+1   b+2   b+3

here, b + 1 is only 8-bits over

so how does C++ know that a + 1 is 64 bits away from a, but that b + 1 is 8 bits away from b?

the answer is the type of the array variable tells C++ what to do

a + 1 is more accurately thought of as a + 1 * sizeof(int)

b + 1 is more accurately thought of as b + 1 * sizeof(char)

in general, if arr is of type T[], then arr + i is arr + i * sizeof(T)

one final simplification is that we don’t usually write down addresses for array locations because they change from run to run of the program

int a[3] = { 3, -2, 0 };


   0     1     2
+-----+-----+-----+
|  3  | -2  |  0  |
+-----+-----+-----+
   a    a+1   a+2

this gives us an idea of why C/C++ starts the index value of an array at 0 (instead of 1)

a, or a + 0, is the address of the first element, a + 1 is the address of the second element, and so on

so the index value is the amount you add to a to get the value you want

Issues with arrays

C-styles arrays don’t know their own length — they are just chunks of bits

you can easily access memory locations outside an array, e.g. a[-2] is the same as *(a - 2)

this can be the source of many subtle errors, and also can present serious security issues

arrays have a fixed size that can’t change, so a task like storing all the numbers in a file in an array is tricky because you first need to know how many numbers there are

for most practical applications, C-style arrays are probably too error-prone and tricky to use

it’s usually more convenient to use strings and vectors