C++ uses stream objects for reading and writing files. There are many different ways to read and write files, and so the standard C++ library provides a large and complex set of stream objects and functions. In these notes we’ll restrict ourselves to basic input and output for text files.
Note
C++ also lets you use C-style input and output if you prefer. Since this is a C++ course, we will not cover those C functions in this course.
I/O is short for “input/output”, and most modern computers model it something like this:
Here we’ll look just at the input/output libraries aspect of this model.
C++ handles I/O using objects known as streams. Input uses istream objects, and output uses ostream objects.
An istream object is used to read a sequence of characters from an input device, e.g. the console, a file, the Internet, a keyboard, a mouse, etc., and then coverts those characters into some other type of data. For example, an istream attached to a mouse might return mouse positions as (x, y) pairs of doubles.
An ostream converts C++ objects to character sequences for some output device, such as the console, a file, the Internet, a graphical window, etc.
We’ve been using istream and ostream objects since the start of the course: cout is a pre-defined ostream, and cin is a pre-defined istream. The << and >> operators are the standard I/O stream operators, and are used with all kinds of streams.
An import detail of both istream and ostream objects is that they may use buffers. A buffer is a region of memory where I/O data is temporarily stored. Buffers are typically used to improve performance. For example, writing data to a hard disk is a relatively slow operation because the disk must physically spin. Instead of doing lots of small writes and spinning the disk spin multiple times, it’s usually faster to store all the data to be written in a buffer and, when its filled, write the entire buffer to disk at once.
Often we don’t care about buffers: in many programs they are a way to make our programs run faster without requiring the programmer to understand the details of how they work. But sometimes they do matter, and so you need to be aware of them. For example, when a program writes data to a file its often writing it to a buffer in memory, which means data could be lost if the program crashes before the buffer is flushed (i.e. written to disk).
Buffers can be an issue even when you are using cout and cin. For example, consider these two similar statements:
cout << "Hello, world!\n";
cout << "Hello, world!" << endl;
The \n in the first cout statement causes the cursor to go to the next line. However, it may be that the text is being written to a buffer, and that buffer might not actually cause anything to print to the screen until later in the run of the program when it fills up with enough characters.
The statement with endl is different. The endl causes the cursor to go to the next line and it also flushes the buffer, i.e. endl immediately forces the text to be written on the screen (even if the output buffer is not filled).
You can force the cout buffer to be written to the screen using flush like this:
cout << "Hello, world!\n" << flush;
One of the most important I/O entities is the file. A file is essentially a named sequence of bytes stored on a device such as a hard disk.
It’s often useful to think of files in different ways depending upon how you are using them. For example, if you’re writing a file-copy function, then it’s best to think of the file as being a sequence of bytes. What those bytes represents doesn’t matter to such a low-level function.
In contrast, if you’re writing a program that edits graphics then for some functions it is useful to think of the graphics files as containing things like colors and pixels. This is the way that users of a graphics editing program probably think about image files.
Files are generally divided into two types: text files, and binary files. Text files contain ASCII/Unicode text that could be edited or viewed in any text editor, while binary files are anything that is not text, e.g. graphics file, sound files, most database files, etc.
Binary files are popular because they can be very compact, and also efficient for a program to read. But they are usually specific to a program, or have standard formats that must be vigilantly followed.
Text files are popular because text is practically a universal standard: any computer program can read a text file in a straightforward way. Plus, humans can read and write text files using any plain text editor. However, text files can be slower and more difficult for programs to process than binary files because complex text files must be parsed. For example, HTML files are text files, but to display them you need to write code that can deal with stuff like this:
<div class="section" id="welcome-to-cmpt-125-spring-2012">
<h1>Welcome to CMPT 125 Spring 2012!<a class="headerlink"
href="#welcome-to-cmpt-125-spring-2012" title="Permalink to this
headline">¶</a></h1> <p>This course is a continuation of CMPT 130
(also CMPT 120 at Surrey when 120 is taught in C), and introduces
new programming techniques and ideas. It uses the C++ language in
the Linux environment.</p>
Converting this to a nice-looking graphical web page is no small task! Even writing a program that merely determines that this is HTML is non-trivial.
The basic procedure for reading a text file in C++ is this:
All three steps are necessary for every file you read.
Here’s how we can open a text file:
#include <fstream> // ifstream is in here
// ...
cout << "Please enter an input file name: ";
string name;
cin >> name;
ifstream ist(name.c_str()); // ifstream expects a C-style string
if (!ist) // test if file was successfully opened
error("can't open file");
This code creates an ifstream object called ist. It’s constructor requires the name of the file to open be a C-style string, and so we use the c_str() function to convert name. We check that the file was successfully opened with an if-statement. You should always do this: unsuccessful file opening is a common problem that could happen in any program that reads a file.
Now we can process the file. Here’s some code that estimates the number of words in a file:
int count = 0;
string word;
while (ist >> word) {
++count;
}
cout << "# of words: " << count << endl;
This works the same way as reading from cin, except we use ist instead of cin. The expression ist >> word extracts a word from the file, and assigns it to word. The expression itself evaluates to true if a word was successfully extracted, and false if the end-of-file character (EOF) was reached. The operating system automatically ensures that all files end with an EOF.
Writing to a file is similar:
cout << "Please enter an output file name: ";
string name;
cin >> name;
ofstream ost(name.c_str()); // ofstream expects a C-style string
if (!ost) // test if file was successfully opened
error("can't open file");
ost << "<html>\n"
<< " <p>Welcome to CMPT 125!</p>\n"
<< "</html>\n";
Again, this is the same as how we have been using cout.
Note that in neither fragment of code have we explicitly closed the file. We don’t need to, because the ifstream and ofstream destructors automatically close their files when the ist and ost objects go out of scope.
Another common way to read a text file is to extract one character at a time from it:
// ... ist is an opened ifstream attached to a file ...
int char_count = 0;
int line_count = 0;
while (!ist.eof()) {
char c = ist.get(); // get 1 char from the file
++char_count;
if (c == '\n') ++line_count;
}
cout << "# of characters: " << char_count << endl;
cout << " # of lines: " << line_count << endl;
The expression ist.eof() returns true if the EOF character has been read in, and false otherwise. It’s a more explicit way of controlling a loop that reads a file, i.e. the while-loop header says “only execute the body if the EOF character has not been read”.
The ist.get() function returns one character from the file, and so this loop reads the file a character a time.
Warning
The end-of-file character EOF counts as character in this program. Depending on your application, you may or may not want to include EOF in the character. In a word processor, you probably wouldn’t want to count EOF as a character, but you probably would in a lower-level command-line program like wc that counts characters, words, and lines in a file.
Another common way to process a file is to read it a line at a time. For instance, this code prints the lines of the input with line numbers:
// ... ist is an opened ifstream attached to a file ...
int num = 1;
string line;
getline(ist, line); // read one line of text from ist
while (!ist.eof()) {
cout << num << ": " << line << endl;
++num;
getline(ist, line); // read one line of text from ist
}
The getline function is used to read one line from a file. Keep in mind that a line is really just the text between two \n characters, and so there is no guarantee about the length of a line: a line could have 0 characters, or a million (or more!).
Correctly reading all the lines from a file is a little bit tricky because you have to handle the EOF the right way. The getline function is called at the very bottom of the loop to ensure that ist.eof() is tested right afterwards. This guarantees that cout is written to only when ist.eof() is false.
If this does not seem tricky to you, then try writing this code using a single call to getline. Make sure it works correctly and does not print a blank line at the end!