Review: C/C++ Data Types¶
Basic Data Types¶
C/C++ provides many low-level data types for representing numbers, e.g.
void data_types() {
cout << "sizeof(bool) = " << sizeof(bool) << "\n"
<< "sizeof(char) = " << sizeof(char) << "\n"
<< "sizeof(short) = " << sizeof(short) << "\n"
<< "sizeof(int) = " << sizeof(int) << "\n"
<< "sizeof(long) = " << sizeof(long) << "\n"
<< "sizeof(long long) = " << sizeof(long long) << "\n"
<< "\n"
<< "sizeof(float) = " << sizeof(float) << "\n"
<< "sizeof(double) = " << sizeof(double) << "\n"
<< "sizeof(long double) = " << sizeof(long double) << "\n"
<< "\n"
<< "sizeof(unsigned char) = " << sizeof(unsigned char) << "\n"
<< "sizeof(unsigned short) = " << sizeof(unsigned short) << "\n"
<< "sizeof(unsigned int) = " << sizeof(unsigned int) << "\n"
<< "sizeof(unsigned long) = " << sizeof(unsigned long) << "\n"
<< "sizeof(unsigned long long) = "
<< sizeof(unsigned long long) << "\n";
}
the sizeof
function returns the size, in bytes, of the object
representation of the type
C/C++ does not specify the exact size of, say, an int
, and so it can be
different on different computers (or with different compiler options set)
this is a distinctive features of C/C++: an int
can be set be defined to
be a size that makes sense for your computer, e.g. perhaps a 32-bit int
on
a 32-bit computer, and a 64-bit int
on a 64-bit computer
in C/C++, the size of a byte is guaranteed to be at least 8 bits, but it might be more, depending on the computer
other basic types have relative guarantees, e.g. a double
is guaranteed to
be either the same size as a float
, or bigger, i.e. sizeof(double) >=
sizeof(float)
so it’s possible in some C/C++ programs that float
and double
are the
same size!
in this course, we’ll assume that we are using laptop/desktop computers with
an 8-bit byte, and an int
of at least 32 bits
Other Numeric Data Types¶
C/C++ has many other numeric data types!
for example:
signed char
char16_t // at least 16 bits, and not smaller than a char
char32_t // at least 32 bits, and not smaller than a char
wchar_t // can represent largest supported character set
int8_t // 8-bit int (exactly 8 bits)
int16_t // 16-bit int (exactly 16 bits)
int32_t // 32-bit int (exactly 32 bits)
int64_t // 64-bit int (exactly 64 bits)
uint8_t // 8-bit unsigned int (exactly 8 bits)
uint16_t // 16-bit unsigned int (exactly 16 bits)
uint32_t // 32-bit unsigned int (exactly 32 bits)
uint64_t // 64-bit unsigned int (exactly 64 bits)
size_t // implementation-defined unsigned integer used for
// indexing container types, such as vector or string
all these different types may seem excessive, but it can be very convenient to have just the right type for your application
e.g. if you are writing a simulator for an 8-bit chip, then uint8_t
would
probably be an ideal data type for it
Signed vs Unsigned Integers¶
void sign_test() {
int a = 0;
unsigned int b = 0;
cout << "a = " << a << "\n"
<< "b = " << b << "\n"
<< "a - 1 = " << a - 1 << "\n"
<< "b - 1 = " << b - 1 << "\n";
}
signed numbers allow for negative values, while unsigned are only non-negative
unsigned data types are useful in, for example, low-level programming, e.g. manipulating individual bits in memory
in this course, we will stick with signed numbers
Min and Max Values¶
#include <climits>
#include <cfloat>
void size_test() {
cout << "CHAR_BIT = " << CHAR_BIT << " (# of bits in a byte)\n"
<< "CHAR_MIN = " << CHAR_MIN << "\n"
<< "CHAR_MAX = " << CHAR_MAX << "\n"
<< "INT_MIN = " << INT_MIN << "\n"
<< "INT_MAX = " << INT_MAX << "\n"
<< "\n"
<< "FLT_MIN = " << FLT_MIN << "\n"
<< "FLT_MAX = " << FLT_MAX << "\n"
<< "DBL_MIN = " << DBL_MIN << "\n"
<< "DBL_MAX = " << DBL_MAX << "\n"
<< "FLT_EPSILON = " << FLT_EPSILON << "\n"
<< "DBL_EPSILON = " << DBL_EPSILON << "\n";
}
notice that floating point types have an epsilon value, which is the smallest positive value they can represent
numbers smaller than the type’s epsilon are 0
Literals and Variables¶
values like -65
, 437.33
, true
, 'c'
, and "c"
are all
examples of literals
-65
is an int
literal
437.33
is a double
literal
true
is a bool
literal
'c'
is a char
literal
"c"
is a string
literal
variables are named values, e.g.
int n = -65;
double x = 437.33;
bool flag = true;
char c = 'c';
string s = "c";
informally, we refer to both n
and -65
as int
values
but more precisely, n
is an int
variable, and -65 is an int
literal
C++ lets you declare const
variables, i.e. variables whose values can be
read but not written:
const int START_YEAR = 2000;
int next_year = START_YEAR + 5; // ok: START_YEAR is read
cout << START_YEAR; // ok: START_YEAR is read
START_YEAR++; // compile-time error: can't
// change the value of a const
Assignment Statements¶
assignment states have this general form:
left = right;
left
is typically a variable, and is known as the left-value, or
l-value of the assignment
right
is some expression that evaluates to a value that can be assigned to
left
, and is known as the right-value, or r-value of the
assignment
note that a literal cannot be an l-value, i.e.:
int n = 5; // ok
4 = n; // compiler error: can't assign a value to 4
Structures¶
a struct
is a collection of named values
e.g.
struct Point {
double x;
double y;
};
void point_test() {
Point a;
Point b{4, -5.5};
Point c;
c.x = 2;
c.y = 3;
cout << a.x << " " << a.y << "\n"
<< b.x << " " << b.y << "\n"
<< c.x << " " << c.y << "\n"
;
}
note how b
is initialized using {}-notation
{}-notation is new in C++11, and is a very convenient way to initialize a
struct
individual data values in a struct
are accessing using dot notation, i.e.
b.x
is the x
value of b
, and b.y
is its y
value
Pointers¶
pointers are one of the most powerful and useful (and dangerous!) features of C/C++
diagrams are good way to think about pointers
double x = 2.6;
7500 7564 7628 7692
---+-----+-----+-----+-----+---
... | | 2.6 | | | ...
---+-----+-----+-----+-----+---
x
computer RAM (main memory)
the compiler and operating system determine the exact location of variables
where a variable is actually stored in memory varies from run to run of a program
double x = 2.6;
double* xp = &x;
7500 7564 8844
---+-----+-----+- -+-----+---
... | | 2.6 | ... |7564 | ...
---+-----+-----+- -+-----+---
x xp
double double*
*xp
pointers let us easily access the values they refer to
so both x
and *xp
access the same location
since we don’t usually know the exact memory addresses, we usually simplify the diagrams
double x = 2.6;
double* xp = &x;
double
+-----+
x | 2.6 |
+-----+
^
|
|
|
xp
Pointers to structs¶
you can make pointers to any data type in C++, even structs
for example:
struct Point {
double x;
double y;
};
Point a{2, 3};
Point* p = &a;
cout << (*p).x
<< (*p).y;
note that you use brackets to access the values in the struct p
points to,
i.e. (*p).x
*p.x
doesn’t work because .
has higher precedence than *
,
and so C++ first tries to evaluate p.x
which is a type error
to avoid the annoyance of having to write brackets in this situation, C/C++
provides the ->
operator:
cout << p->x // shorthand for (*p).x
<< p->y; // shorthand for (*p).y
pointer to structs are quite common in C/C++ program, so it is worthwhile to learn this notation!
Arrays¶
C-style arrays are a low-level way of dealing with contiguous chunks of memory
in C++, arrays are typically used as a way to implement other array-like things, like vectors or strings
int a[3] = { 3, -2, 0 };
7500 7564 7628 7692
---+-----+-----+-----+-----+---
... | 3 | -2 | 0 | | ...
---+-----+-----+-----+-----+---
a a+1 a+2
essentially, a
is a pointer to the first element of the array
a + 1
points to the second element
a + 2
points to the third element
we can access array elements using *
notation:
cout << *(a + 0) // or just *a
<< *(a + 1)
<< *(a + 2);
or []
notation:
cout << a[0]
<< a[1]
<< a[2];
in general, a[i]
is shorthand for *(a + i)
in our example, int
is 64 bits, and so a + 1
is a memory location
64-bits over
but suppose we instead have an array of characters, where a char
is (say)
8 bits:
char b[4] = {'c', 'a', 't', 's'};
6100 6108 6116 6124
---+-----+-----+-----+-----+---
... | c | a | t | s | ...
---+-----+-----+-----+-----+---
b b+1 b+2 b+3
here, b + 1
is only 8-bits over
so how does C++ know that a + 1
is 64 bits away from a
, but that b +
1
is 8 bits away from b
?
the answer is the type of the array variable tells C++ what to do
a + 1
is more accurately thought of as a + 1 * sizeof(int)
b + 1
is more accurately thought of as b + 1 * sizeof(char)
in general, if arr
is of type T[]
, then arr + i
is arr + i *
sizeof(T)
one final simplification is that we don’t usually write down addresses for array locations because they change from run to run of the program
int a[3] = { 3, -2, 0 };
0 1 2
+-----+-----+-----+
| 3 | -2 | 0 |
+-----+-----+-----+
a a+1 a+2
this gives us an idea of why C/C++ starts the index value of an array at 0 (instead of 1)
a
, or a + 0
, is the address of the first element, a + 1
is the
address of the second element, and so on
so the index value is the amount you add to a
to get the value you want
Issues with arrays¶
C-styles arrays don’t know their own length — they are just chunks of bits
you can easily access memory locations outside an array, e.g. a[-2]
is the
same as *(a - 2)
this can be the source of many subtle errors, and also can present serious security issues
arrays have a fixed size that can’t change, so a task like storing all the numbers in a file in an array is tricky because you first need to know how many numbers there are
for most practical applications, C-style arrays are probably too error-prone and tricky to use
it’s usually more convenient to use strings and vectors