Name Collisions

Suppose Xavier and Yang are working together on a large programming project. While they do talk to each other, they don’t discuss every line of code they write.

One day Xavier writes code like this:

// kbd_events.cpp

// Returns true if s is a whitespace character, or a two-char code
// for a whitespace character.
bool is_whitespace(const string& s) {
    return c == " " || c == "\n" || c == "\t" || c == "\r"
        || c == "\\n" || c == "\\t" || c == "\\r";
}

And Yang writes this code:

// render.cpp

// Returns true if every character in s is a whitespace character.
bool is_whitespace(const string& s) {
    for(char c : s) {
        if !(c == " " || c == "\n" || c == "\t" || c == "\r") {
            return false;
        }
    }
    return true;
}

There’s a problem: they have written two different functions with exactly the same header. When it comes time to link these programs together, there will be a name collision, which will cause the linking to fail.

One way to fix this problem is to rename one, or both, of the functions, e.g. Yang’s is_whitespace could be renamed to this:

bool is_all_whitespace(const string& s) {
    // ...
}

That works, but now coordinating names becomes one of many details that the programmers must keep in mind. How exactly should they do this? Should they get the other permission every time they create a new function or class? What if the other person can’t reply quickly, or is on vacation — should they have to wait a week before deciding what to call a function?

Using Namespaces

A solution to the problem of naming that doesn’t require so much communication is for Xavier and Yang to agree ahead of time to use some kind of special naming convention. For example, they might agree that all Xavier’s names should begin with event_, and all Yang’s names should begin with render_, e.g.:

// Xavier's names all start with event_

// Returns true if s is a whitespace character, or a two-char code
// for a whitespace character.
bool event_is_whitespace(const string& s) {
    // ...
}

// Yang's names all start with render_

// Returns true if s is a whitespace character, or a two-char code
// for a whitespace character.
bool render_is_whitespace(const string& s) {
    // ...
}

Now there is no collision: the two different functions have different names.

In C++, a namespace is essentially a built-in version of this idea. A C++ namespace is a collection of names (for variables, classes, functions, etc.) that helps prevent name collisions by providing a common prefix name.

Xavier and Yang could can solve their naming problem by using their own namespaces. Xavier writes code inside the event namespace:

namespace event {
    // kbd_events.cpp

    // Returns true iff s is a whitespace character, or a two-char code
    // for a whitespace character.
    bool is_whitespace(const string& s) {
        return c == " " || c == "\n" || c == "\t" || c == "\r"
            || c == "\\n" || c == "\\t" || c == "\\r";
    }
} // namespace event

Yang writes code inside the render namespace:

// render.cpp

namespace render {

    // Returns true if every character in s is a whitespace character.
    bool is_whitespace(const string& s) {
        for(char c : s) {
            if !(c == " " || c == "\n" || c == "\t" || c == "\r") {
                return false;
            }
        }
        return true;
    }

    // ...

} // namespace render

Each individual function can be called by including it’s namespace, e.g.:

if (event::is_whitespace(s)) {   // calls is_whitespace in event namespace
   // ...
}

if (render::is_whitespace(s)) {   // calls is_whitespace in render namespace
   // ...
}

A nice feature of this solution is that it lets Xavier and Yang use whatever names they like, and the namespace provides a way to uniquely refer to them.

using Directives

While namespaces provide a solution to the problem of name collisions, it comes at the cost of longer and more cluttered source code. So C++ provides the using command to access to names in a namespace without needing to use ::. For example:

using namespace event;            // provide unqualified access to all names in
                                  // the event namespace

if (is_whitespace(s)) {           // calls is_whitespace in event namespace
   // ...
}

if (render::is_whitespace(s)) {   // calls is_whitespace in render namespace
   // ...
}

Calling just is_whitespace calls the version in event. You can still call the one in render by writing render::is_whitespace.

There could be a lot of names in the event namespace, and the statement using namespace event gives you access to all of them. Sometimes, it is better to use particular names, e.g.:

using namespace event::is_whitespace;   // provide unqualified access to just the
                                        // is_whitespace name from event

if (is_whitespace(s)) {                 // calls is_whitespace in event namespace
   // ...
}

if (render::is_whitespace(s)) {         // calls is_whitespace in render namespace
   // ...
}

Note that you cannot use the same name from different namespaces, e.g.:

using namespace event::is_whitespace;

using namespace render::is_whitespace;  // compiler error: is_whitespace already used

A rule of thumb that some programmers like to follow is to never use using statements. The reason is that by always including the namespace name, it ultimately the code clearer and easier to understand. Someone reading the code never has to wonder which function is being used.

Our textbook also mentions unnamed namespaces and a subtlety using unqualified using statements, but we will not cover those here.

A Note on Variable Naming: Hungarian Notation

The idea of giving variables names that describe their type is sometimes called Hungarian notation. It was used in the 1980s (and beyond) at Microsoft as a coding standard. Essentially, Hungarian notation is a standardized way of including type-information in a variable’s name, e.g.:

bBusy           // b means boolean
chInitial       // ch means char
dwBuff_size     // dw means double word
dbPi            // db means double
rgStudents      // rg means array (range)
fnClose         // fn means function name

The nice thing about this notation is that you know a variable’s just by looking at it.

However, it has a few problems. The names can be confusing and hard to remember. If a variable/function changes it’s type, then the programmer must remember to change the name as well. Some variables can hold values of more than one type, e.g. in object-oriented programming an object of type Window is also be of type Box if the Window class inherits from the Box class.

Because of these sorts of problems, Hungarian notation is no longer a popular way of naming variables, and is generally discouraged.

So how should you choose names? In general, choosing good names for variables, functions, methods, classes, etc. can be tricky. It takes experience to make good choices.

Here are a few rules of thumb to keep in mind for variables (similar considerations apply for functions, classes, files, etc.):

  • Variable names should be self-descriptive, e.g. a programmer who didn’t write the program should be able to make a reasonable guess about the purpose of a variable from looking at the name alone.

  • Variable name length can be tricky:

    • Longer names are good when a variable needs a name that is very self-descriptive.

    • Shorter names are good when you must use the same variable many times in some code. Longer names can make the code harder to format and read. Compare:

      n*n + 5*n - 1
      
      num_opened_files * num_opened_files + 5 * num_opened_files - 1
      

      num_opened_files is more descriptive than n, but it is so long that it obscures the structure of the expression.

  • Use standard variable names. For example, i and j are often the variables used as a loop index, and x and y are standard names for for 2-dimensional coordinates.

  • Use a consistent style for separating the parts of a name. For example, you could use _ characters, e.g. base_work_rate. Or instead you could use capitalization, e.g. baseWorkRate.

    Different programmers have different preferences. It’s usually best to be consistent: pick one style and stick with it.

  • Some programming languages give special meanings to certain kinds of names. For example, in the Go programming language, a capitalized name means the name is public, and can be accessed by code outside the current package. In the Haskell language, capitalized names refer to types, while lowercase names refer to variables/functions.

    C++ has a few rules for variable names, e.g.:

    • A C++ variable name can’t contain certain symbols, such as ' ', ',', '.', etc.
    • A C++ variable name cannot start with a digit.
    • A C++ variable name can’t be the same as a keyword, e.g. you can’t use if or while as variable name.