managing memory turns out to be one of the trickiest and most important parts of C++ programming
a C++ program divides its memory into three main components
static memory has a fixed size (that is known at compile-time) that never changes; global variables are stored here
stack memory changes in size, and is where local variables and function parameters are stored; as the name suggests, stack memory is implemented as a stack, where new is pushed/popped only on the top of the stack; stack memory is handled automatically by C++, e.g. variables are pushed onto the stack when they are defined, and popped from the stack when they go out of scope
free store (heap) memory is all the other memory, and it is accessed in C++ using new and delete/delete[]; it allows programmers to access chunks of memory while a program is running, and that memory is allowed to exist between function calls; by default, free store memory must be manually managed by the programmer, which turns out to be a surprisingly difficult thing to do correctly
in C++, free store memory is accessed via pointers
a pointer is a data type that, essentially, stores the address of some memory
double x = 5.2; // x is a local variable, thus it’s stored on the stack
here, x names a chunk of memory that stores the value 5.2
5.2 is stored at some location in the computer’s memory, and that location has an address
double* x_addr = &x;
the & operator returns the address of a variable
double* is the data type “pointer to a double”, i.e. x_addr is only allowed to store the address of a memory location that store’s a double
when we access the free store, we must do it through pointers
double* y_addr = new double(6.4);
new always returns a pointer
use the * to de-reference a pointer, i.e. to access the value it points to
cout << *y_addr << "\n";
*y_addr += 2;
cout << *y_addr << "\n";
in C++, having a pointer to a value lets us read/write that value in the usual way using the *
whenever you new some memory, you must also be sure to delete it when you’re done
delete y_addr;
if you forget to do this, your program is said to have a memory leak
undeleted memory cannot be re-used
for small, toy programs that’s fine: when the program ends any undeleted memory is given back to the operating system
for long-running programs (such as an operating system, a web server, a web browser, a text editor, etc.), this can be a major problem that eventually causes them to crash
experience shows that finding and fixing memory leaks is very difficult in general
many other programming languages use an automatic memory deleter know as a garbage collector to deal with this
but garbage collectors can have a run-time performance impact that is too high for many of the applications that C++ is used for
instead, C++ provides destructors, and also smart pointers, to give programmers control over memory in a way that it is just about as easy having a garbage collector
Problem: Given a path name to a folder (on Linux), print all its files (not folders!) ordered from largest to smallest. Print the name of each file along with its size (in bytes).
a good strategy for solving programming problems like this to first write down a high-level description of the steps you think will solve the problem
this is quite high-level — it looks nothing like C++ code!
but it is a description of a program that, if we could do each step, should solve our problem
the first step is easy, but the other steps are not so simple
indeed, it is not at all clear how, in C++, one is supposed to get files, and their sizes, given just the name of a folder
so lets list the sub-problems we’ll need to solve
we will have to
lets figure out how to do these basic things first
note that we are assuming a Linux file system
the functions needed for other file systems might be different!
in Linux, the stat command can be used to get lots of information about a file, e.g.:
$ stat error.h
File: ‘error.h’
Size: 556 Blocks: 8 IO Block: 4096 regular file
Device: 802h/2050d Inode: 75240436 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 1000/ tjd) Gid: ( 1000/ tjd)
Access: 2015-01-10 14:32:17.145603274 -0800
Modify: 2014-12-03 14:40:32.385175073 -0800
Change: 2014-12-03 14:40:32.385175073 -0800
Birth: -
among other things, you can see that the size of the file error.h is 556 bytes
it turns out there is a similar stat function available in C++ that returns a stat struct defined like this:
//
// from http://linux.die.net/man/2/stat
//
struct stat {
dev_t st_dev; // ID of device containing file
ino_t st_ino; // inode number
mode_t st_mode; // protection
nlink_t st_nlink; // number of hard links
uid_t st_uid; // user ID of owner
gid_t st_gid; // group ID of owner
dev_t st_rdev; // device ID (if special file)
off_t st_size; // total size, in bytes
blksize_t st_blksize; // blocksize for file system I/O
blkcnt_t st_blocks; // number of 512B blocks allocated
time_t st_atime; // time of last access
time_t st_mtime; // time of last modification
time_t st_ctime; // time of last status change
};
this struct contains all the basic info associated with a file in Linux
the size, in bytes, is st_size
the data type off_t is an integral type (perhaps unsigned long)
the stat struct is important because it is what the stat function gives us:
#include <sys/stat.h>
void print_file_info(const string& s) {
struct stat info; // note that we must put struct first!
// the stat function expects a C-style string, so we can't pass it s
// instead, we call s.c_str() to get a C-style string
//
// the second parameter to the stat function expects a pointer to a
// stat object
//
// if the call to stat returns -1, then no file of that name was found
//
if (stat(s.c_str(), &info) == -1) // note the & at the start of info!
{
cmpt::error("file does not exist");
}
cout << s << " is " << info.st_size << " bytes\n";
}
you can use this function to print the size of any file
this even works if s is the name of a directory
notice how the return value of stat is used to signal success/failure
and the stat struct is modified through C-style pass-by-reference
it turns out if we have a stat structure for a file/folder, we can test if it is folder by passing the value st_mode to the function S_ISDIR:
#include <sys/stat.h>
void print_file_info(const string& s) {
struct stat info; // note that we must put struct first
// the stat function expects a C-style string, so we can't pass it s
// instead, we call s.c_str() to get a C-style string
//
// the second parameter to the stat function expects a pointer to a
// stat object
//
// if the call to stat returns -1, then no file of that name was found
//
if (stat(s.c_str(), &info) == -1) // note that it's &info
{
cmpt::error("file does not exist");
}
cout << s << " is " << info.st_size << " bytes, ";
if (S_ISDIR(info.st_mode)) {
cout << "it's a folder\n";
} else {
cout << "it's a file\n";
}
}
the final sub-problem we want to solve is how to get a list of all the files in a given folder
to do this, we’ll use dirent.h
while not an official standard library file, dirent.h is pretty commonly used to get directory info in C/C++
dirent.h has some structures and functions we need to know
DIR is a struct that is used to store information about the entries in a directory
strictly speaking, we know nothing about it’s actual contents — we only know that it is named DIR
the opendir function opens a directory:
DIR* opendir(const char *);
opendir takes a path name (as a C-style string) and returns a pointer to a newly created DIR object
if the folder can’t be opened (e.g. because there is no folder of that name), a null pointer is returned
when you are done, use closedir to close a directory:
int closedir(DIR*);
closedir takes a pointer to the DIR object to close
dirent is a struct that contains information about the individual files/folders in a DIR object
a dirent struct has, at least, these two entries:
ino_t d_ino // file serial number
char d_name[] // name of entry (C-style string)
we will only be concerned with d_name here
to get the next dirent object from a DIR, use
struct dirent* readdir(DIR*);
readdir takes a pointer to a DIR object as input, and returns a pointer to the next dirent object
readdir returns a null pointer when there are no more dirent objects
now we combine all these structure and functions into a function that returns all the names in a directory:
vector<string> dirlist(const string& path) {
DIR* cwd = opendir(path.c_str());
if (cwd == nullptr) cmpt::error("couldn't open directory");
vector<string> result;
dirent* entry;
while ((entry = readdir(cwd))) {
result.push_back(entry->d_name);
}
closedir(cwd);
return result;
}
now that we know how to solve the sub-problems, let solve the original problem
since we will need to store the name and size of a file, lets create a struct to do that:
struct File {
string name;
off_t size; // off_t is an integer type from the stat object
};
we use the data type off_t because that’s the type of the file size, and we don’t want to have to do any type conversions (which might cause errors/confusion)
now the idea is to read all the files, and their sizes, in a vector<File> that we can sort in whatever order is convenient
vector<File> get_files(const string& folder_name) {
// open the given folder
DIR* cwd = opendir(folder_name.c_str());
if (cwd == nullptr) cmpt::error("couldn't open directory");
// store all the file names in DIR in result
vector<File> result;
dirent* entry;
while ((entry = readdir(cwd))) {
struct stat info;
if (stat(entry->d_name, &info) == -1) {
cmpt::error("file does not exist");
}
if (S_ISDIR(info.st_mode)) {
// it's a folder, ignore it
} else {
result.push_back(File{entry->d_name, info.st_size});
}
} // while
closedir(cwd);
return result;
} // get_files
now we can get a vector of all the files in a given folder like this:
vector<File> files = get_files("."); // "." means current directory
for(const File& f : files) {
cout << f.name << ", " << f.size << " bytes\n";
}
"." is the standard Linux notation for “current directory”
sorting data into order is an interesting and important topic
sorting has applications in many different areas of programming
many different sorting algorithms have been created
some are fast, some are slow, some use extra memory, some don’t
first, let’s see how to sort a vector<File> using the standard sort function that comes with C++
then we’ll implement our own simple sorting routine
the C++ sort function can be used to sort any vector or array (and pretty much any other sequential data structure defined in the right way)
it is very efficient in both time and memory usage
for sort to work, it has to know how to compare the objects being sorted
there is no < operator defined for our File objects, so what we can do is pass in a boolean comparison function that sort will use to do the comparisons
#include <algorithm>
// ...
// comparison function
bool less_than_size(const File& a, const File& b) {
return a.size < b.size;
}
void test() {
vector<File> files = get_files(".");
sort(files.begin(), files.end(), less_than_size);
reverse(files.begin(), files.end());
for(const File& f : files) {
cout << f.name << ", " << f.size << " bytes\n";
}
}
note that we use the reverse function to reverse the order of the files so they are from most number of bytes to least number of bytes (as required by the original problem)
here’s how we could sort the files alphabetically by name:
bool less_than_name(const File& a, const File& b) {
return a.name < b.name;
}
bool less_than_size(const File& a, const File& b) {
return a.size < b.size;
}
void test3() {
vector<File> files = get_files(".");
sort(files.begin(), files.end(), less_than_size);
reverse(files.begin(), files.end());
for(const File& f : files) {
cout << f.name << ", " << f.size << " bytes\n";
}
cout << "-------\n";
sort(files.begin(), files.end(), less_than_name);
for(const File& f : files) {
cout << f.name << ", " << f.size << " bytes\n";
}
}
in practice, it almost always makes sense to use the C++ sort function when you need to sort data
it is fast, efficient, flexible, and almost certainly error-free
but it is instructive to see how to write our own simple sorting function
initially, we won’t be overly concerned with performance and instead aim just for correctness
later in the course we will see some more efficient sorting algorithms
bubblesort is a simple sorting algorithm with a cute name
however, it is terribly slow in practice — its slowness is probably the most interesting thing about it!
so never use bubblesort for sorting large amounts of data!
but it’s instructive to walk through its development
the idea of bubblesort is to repeatedly scan through the list of items you want to sort, and swap adjacent items that are out of order
you stop after you scan through the vector without having swapped any items
lets trace an example by hand
we start with this list of numbers that we want to put into ascending order
8 3 6 1 5
8 3 6 1 5
^ ^
3 8 6 1 5
^ ^
3 6 8 1 5
^ ^
3 6 1 8 5
^ ^
3 6 1 5 8
^ ^
3 6 1 5 8
^ ^
3 6 1 5 8
^ ^
3 1 6 5 8
^ ^
3 1 5 6 8
^ ^
3 1 5 6 8
3 1 5 6 8
^ ^
1 3 5 6 8
^ ^
1 3 5 6 8
^ ^
1 3 5 6 8
^ ^
1 3 5 6 8
^ ^
1 3 5 6 8
^ ^
1 3 5 6 8
^ ^
1 3 5 6 8
^ ^
1 3 5 6 8
notice that no swaps are done during the final scan
that’s how bubblesort knows that the numbers must be in ascending order
now lets write our own bubblesort to sort a vector<File> object from smallest size to largest size
void bubblesort(vector<File>& files) {
bool sorted = false;
while (!sorted) {
sorted = true;
for(int i = 1; i < files.size(); ++i) { // note i starts at 1
if (files[i - 1].size > files[i].size) {
sorted = false;
// swap values in files[i - 1] and files[i]
File temp = files[i - 1];
files[i - 1] = files[i];
files[i] = temp;
}
}
}
}
look at the bool variable sorted
it controls the outer while-loop
it must start out false so that the while runs at least once
then, inside the loop, it is immediately set to true
then, in the for-loop, if one, or more, swaps, are made, we set sorted to false
this means that at least one more scan will be done after this one
using a bool variable like this to signal and event is a common kind of trick in programming
we sometimes call such variables “flags”
we could have simplified the main bubblesort code a little bit by putting the swapping code in its own function:
void swap(File& a, File& b) {
File temp = a;
a = b;
b = temp;
}
void bubblesort(vector<File>& files) {
bool sorted = false;
while (!sorted) {
sorted = true;
for(int i = 1; i < files.size(); ++i) { // note i starts at 1
if (files[i - 1].size > files[i].size) {
sorted = false;
swap(files[i - 1], files[i]);
}
}
}
}
this doesn’t change the functionality or performance (much, at least)
but it does make the code easier to read and thus easier to understand