Calculator Notes¶
The Goal¶
we want to refresh our memories of basic C++
and also see a few modern C++ features you may not be aware of
lets do this by writing a calculator, that works like this:
--> 7 + 4 * 5
27
--> 10 / (7 - 2)
2
this is a fairly challenging problem because expressions can be very long, contain brackets, have unusually placed spaces, e.g.:
1 + 2 + 3 - 4 / (1 - (2+2))
The Plan¶
we’ll follow a three-step plan for making the calculator
write a tokenizer that reads a string and divides it into tokens, e.g.:
(1 + 2) * 3
consists of 7 tokens: <(>, <1>, <+>, <2>, <)>, <*>, <3>
the tokenizer takes a string as input and returns a vector of tokens as output
the tokenizer will handle spaces between tokens
write a postfix evaluator, e.g. a function that evaluates expressions written in post-fix notation like this:
> 2 3 + 5 > 4 5 * 20 > 2 3 + 4 * 20
postfix notation puts operators after the numbers
in the usual infix notation, like
1 + 2
, the operator appears between the numberspostfix moves the operator to the end, e.g.
1 2 +
the surprising thing about postfix is that it is very easy to evaluate
it never needs brackets, and there is never any ambiguity in the order of operations
write an infix-to-postfix function that converts an infix expression to postfix format using the so-called shunting yard algorithm
The Tokenizer (aka Scanner, aka Lexer)¶
the tokenizer takes a string as input and returns a sequence of tokens:
vector<Token> scan(const string& s)
vector
is the standard C++ vector class
Token
is a struct
we write ourselves:
struct Token {
Token_type type;
int value; // only used when type == Token_type::NUMBER
};
Token_type
is an enumerated type defined like this:
// ": char" causes each value of this enum to be represented as a char.
enum class Token_type : char {
LEFT_PAREN = '(',
RIGHT_PAREN = ')',
PLUS = '+',
BINARY_MINUS = 'm',
UNARY_MINUS = 'u',
TIMES = '*',
DIVIDE = '/',
NUMBER = 'n' // n for "number"
};
an enum class
is a finite set of values that represent some new type
each value of Token_type
represents one of the tokens that can appear in
our expression
The Two Minuses¶
notice that there are two minuses: Token_type::BINARY_MINUS
and
Token_type::UNARY_MINUS
in 4 - 3
, the -
is a binary minus
in -(3 + 2)
, the -
is a unary minus
they use the same symbol, but they are different operations that must be handled differently
enum classes verses C-style enums¶
Token_type
is an enum class
enum class
is a C++11 feature that doesn’t appear in earlier versions of
C++ (or C)
earlier versions had only enum
, what we’ll call C-style enumerations,
e.g.:
enum cardsuit { // C-style enum without "class"
CLUBS,
DIAMONDS,
HEARTS,
SPADES
};
enum cardsuit trump;
it’s the same ideas as an enum class
, but the values are always
implemented as int
s, and can be treated like int
s
the values in a C-style enum
are also global names, so they “pollute” the
global name space, and so, for example, two different C-style enum
s
can’t have the same values
an enum class
fixes the problem, but you need to write
Token_type::PLUS
instead of just PLUS
Printing an enum class¶
we’ll sometimes want to print a Token_type
value to the screen, and a nice
way to do that in C++ is like this:
// Overload operator<< so that we can easily print Token_type values.
ostream& operator<<(ostream& os, const Token_type& tt) {
os << "'" << char(tt) << "'";
return os;
}
operator<<
refers to the <<
operator, and now we can write code like
this:
Token_type tt = Token_type::PLUS;
cout << "tt = " << tt << "\n"; // tt = '+'
notice the type header for the operator<<
function
ostream&
is the return-type, i.e. a reference to an ostream
object
the first input os
is also a reference to an ostream
in the body, we use <<
to put a Token_type
on os
, and then return
os
this is the standard format when you overload operator<<
for printing
A Couple of Token_Type Helper Functions¶
later we’ll need to be able to tell which tokens are operators and which are not, and so this:
bool is_op(Token_type tt) {
switch (tt) {
case Token_type::TIMES: return true;
case Token_type::DIVIDE: return true;
case Token_type::PLUS: return true;
case Token_type::BINARY_MINUS: return true;
case Token_type::UNARY_MINUS: return true;
default: return false;
} // switch
}
int precedence(Token_type tt) {
switch (tt) {
case Token_type::UNARY_MINUS: return 4;
case Token_type::TIMES: return 3;
case Token_type::DIVIDE: return 3;
case Token_type::PLUS: return 2;
case Token_type::BINARY_MINUS: return 2;
default: cmpt::error("precedence default case reached");
} // switch
return -1; // can never be reached!
}
both of these functions use a switch
statement on a Token_type
value
we could have instead used an if-statement, but a switch
is easy to read
and understand here
in the precedence
function, notice that the default
case calls
cmpt::error
that’s a design decision: we’ve decided that if you pass a non-operator
Token_type
to precedence
then it will crash with an error
we didn’t need to do that; we could instead have returned some special value, like -1
handling errors is an important topic, and we will come back to it again and again throughout the course
Token Objects¶
we need one more struct
to represent an individual token:
struct Token {
Token_type type;
int value; // only used when type == Token_type::NUMBER
};
a Token
stores its type and, in the case of numbers, its value
value
is only meaningful when the type
is Token_type::NUMBER
for convenience, we overload operator<<
to print Token
s:
ostream& operator<<(ostream& os, const Token& t) {
if (t.type == Token_type::NUMBER) {
os << "<" << t.value << ">";
} else {
os << "<" << char(t.type) << ">";
}
return os;
}
next, we use typedef
to make a synonym for a vector<Token>
:
// Sequence is a type synonym for vector<Token>, i.e. Sequence is another name
// for the type vector<Token>.
typedef vector<Token> Sequence;
// Print a vector of Tokens in a nice format.
ostream& operator<<(ostream& os, const Sequence& tokens) {
int n = tokens.size();
if (n == 0) {
os << "{}";
} else if (n == 1) {
os << "{" << tokens[0] << "}";
} else {
os << "{" << tokens[0];
for(int i = 1; i < tokens.size(); ++i) {
os << ", " << tokens[i];
}
os << "}";
}
return os;
}
Back to the Tokenizer¶
so far we’ve assumed the scan
function has this header:
Sequence scan(const string& s)
but there is a problem: it’s possible that the scanner could fail, e.g. it could encounter an unknown token such as “x” in “x + 2” (our calculator doesn’t handle variables)
what should the returned vector<Token>
be in the case when s
results
in failure?
what we’ll do is use a simple Scan_result
object that will tell us if
their’s a failure:
// The result of a scan is either a Sequence object, or an error.
// If the value is a Sequence, then okay() returns true; if it's an error,
// then okay() returns false.
struct Scan_result {
Sequence value;
string error_msg;
bool okay() { return error_msg.empty(); }
}; // struct Scan_result
ostream& operator<<(ostream& os, const Scan_result& sr) {
os << "Scan_result{" << sr.value << ", " << quote(sr.error_msg) << "}";
return os;
}
Scan_result
is a struct
that holds both a Sequence
and an
error_msg
if the scanner fails, then it sets the error_msg
to be some helpful string
okay()
is a method, i.e. a function in a struct
(or class
), and
using it we can easily test if the returned Scan_result
represents a
success or failure
also, for convenience, we also overload operator<<
to print
Scan_result
objects in a useful format
The Scanner Body¶
now our scanning function has this header:
Scan_result scan(const string& s) {
// ...
}
let’s write the body
the basic strategy will be to loop through s
one character at a time
whitespace characters — like space, tab, and newline — are ignored
single-character tokens — like +
, *
, (
— are immediately
converted to their corresponding Token
objects
a -
will always be treated as a Token_type::BINARY_MINUS
; later we
will change some of them to Token_type::UNARY_MINUS
if necessary
numbers, for this program, will be restricted to non-negative integers consisting of 1 or more digits
when the scanner encounters a number, it gets the entire number by looping through the following characters until it finds either a non-digit or the end of the string
The Scanner Body¶
// Returns true if, and only if, c is a whitespace character.
bool is_whitespace(char c) {
return c == ' ' || c == '\n' || c == '\r' || c == '\t';
}
// Returns true if, and only if, c is a decimal digit.
bool is_digit(char c) {
return '0' <= c && c <= '9';
}
// Convert s into a vector of tokens. Calls cmpt::error on unknown characters.
// Throws std::out_of_range if an out-of-range int is encountered.
// A '-' is always counted as a Token_type::BINARY_MINUS,
Scan_result scan(const string& s) {
Sequence result;
for(int i = 0; i < s.size(); i++) {
if (is_whitespace(s[i])) {
// skip all whitespace
i++;
while (i < s.size() && is_whitespace(s[i])) i++;
// invariant: i >= s.size() || !is_whitespace(s[i])
// If we're not at the end of the string, then decrement i because
// we had to look one character ahead to see that it wasn't
// whitespace.
if (i < s.size()) i--;
} else if (is_digit(s[i])) {
string num = string(1, s[i]);
i++;
while (i < s.size() && is_digit(s[i])) {
num += string(1, s[i]);
i++;
}
// invariant: i >= s.size() || !is_digit(s[i])
// If we're not at the end of the string, then decrement i because
// we had to look one character ahead to see that it wasn't a
// digit.
if (i < s.size()) i--;
result.push_back(Token{Token_type::NUMBER, stoi(num)});
} else if (s[i] == '(') {
result.push_back(Token{Token_type::LEFT_PAREN, 0});
} else if (s[i] == ')') {
result.push_back(Token{Token_type::RIGHT_PAREN, 0});
} else if (s[i] == '+') {
result.push_back(Token{Token_type::PLUS, 0});
} else if (s[i] == '-') {
result.push_back(Token{Token_type::BINARY_MINUS, 0});
} else if (s[i] == '*') {
result.push_back(Token{Token_type::TIMES, 0});
} else if (s[i] == '/') {
result.push_back(Token{Token_type::DIVIDE, 0});
} else {
string msg = "scanner encountered unknown character '" + string(1, s[i]) + "'";
return Scan_result{Sequence{}, msg};
}
} //for
return Scan_result{result, ""};
} // scan
Comments on the Scanner¶
while the scanner was not that hard to create by hand in the way we did it, there are other approaches to writing scanners
for example, you could use regular expressions, or special software tools that automatically generate scanners (e.g. Flex)
The Minus Fixer¶
the scanner treats all -
s as Token_type::BINARY_MINUS
but, for an infix expression, we want to treat -
as unary if:
- it is the first token of an expression, e.g.
-5
,-(1 + 2)
- the previous token is an operator or a
(
, e.g.1 + -3, (-2 * 6)
all other instances of -
are assumed to be binary
void minus_fix(Sequence& seq) {
if (seq[0].type == Token_type::BINARY_MINUS) {
seq[0].type = Token_type::UNARY_MINUS;
}
for(int i = 1; i < seq.size(); ++i) {
Token_type prev = seq[i - 1].type;
if (seq[i].type == Token_type::BINARY_MINUS) {
if (prev == Token_type::LEFT_PAREN || is_op(prev)) {
seq[i].type = Token_type::UNARY_MINUS;
}
}
} // for
}
notice that seq
is passed by reference so it can, if necessary, be
modified by minus_fix
The Postfix Evaluator¶
postfix expressions put operators after the number
for example, 2 * 3
is an infix expression because the *
is in-between
the 2 and the 3
2 3 *
is an equivalent postfix expression because the *
is after
(post) the numbers
postfix never needs parentheses, e.g.
2 + 3 * 4
is2 3 4 * +
in postfix(2 + 3) * 4
is2 3 + 4 *
in postfix
evaluating a postfix expression turns out to be relatively easy
first, we need to use a stack …
Stacks¶
a stack is used to store the numbers as they are seen
for us, a stack is just a vector<int>
where numbers are added and removed
only at the right end
adding a number to a stack is a push operation
removing a number from a stack is called a pop operation
for example after these operations:
push 2 // add a 2 to the top of the stack
push 1 // add a 1 to the top of the stack
push 8 // ...
pop // remove the top element of the stack
pop // remove the top element of the stack
push 3 // ...
push 4 // ...
the stack looks like this:
4 <-- top of the stack
3
2 <-- bottom of the stack
The Postfix Evaluation Algorithm¶
here’s pseudocode for the basic postfix evaluation algorithm
pseudocode is a description of an algorithm using structured English that is similar a program
stack <-- {} // stack is initially empty
for each token t in the token sequence:
if t is a number then
stack.push t
else if t is a binary operator (like '+' or '/'):
a = stack.pop
b = stack.pop
result = evaluate(a, t, b)
push result
else if t is unary minus
result = stack.pop
push -result
once you get the hang of it, it’s simple enough to do by hand
for example:
2 3 * 4 5 * +
using the algorithm, we first push 2 and push 3
the *
pops the top 2 elements, 2 and 3, pushes 6 (their product)
then we push 4 and push 5; at this point the stack looks like this:
4 <-- top of the stack
5
6 <-- bottom of the stack
the *
now pops the 4 and 5, and pushes 20 (their product)
the +
then pops the 20 and 6, and pushes 26 (their sum)
now there’s one item on the stack: 26
this is the value of the original postfix expression
Implementing a Postfix Evaluator¶
now lets make a function that evaluates postfix expressions
just as with the scanner, it’s possible that postfix_eval
could fail and
return an error message
Int_result postfix_eval(const Sequence& tokens)
the input is a sequence of tokens, and the output is an Int_result
which
is defined like this:
// helper function for popping elements from a stack made from a
// vector<T>, where T is any type;
// to push onto a stack just use the vector's push_back method
template<class T>
T pop(vector<T>& stack) {
T result = stack.back();
stack.pop_back();
return result;
}
struct Int_result {
int value;
string error_msg;
bool okay() { return error_msg.empty(); }
};
string quote(const string& s) {
return "\"" + s + "\"";
}
ostream& operator<<(ostream& os, const Int_result& ir) {
os << "Int_result{" << ir.value << ", " << quote(ir.error_msg) << "}";
return os;
}
note that quote
is a helper function for putting quote-marks around a
string — it simplifies the operator<<
function
the pop
function uses templates, i.e. it allows the type T
of objects
contained in the vector to be any type at all
the line template<class T>
tells C++ that T
is any type, and that it
will be supplied by the code that calls pop
this means we could call pop
with a vector<int>
or a vector<token>
or a vector containing any type of object
we’ll use pop
both in the postfix evaluator, and the infix-to-postfix
translator that comes later
Implementing a Postfix Evaluator¶
// Assumes tokens form a valid postfix expression.
Int_result postfix_eval(const Sequence& tokens) {
vector<int> stack;
bool ok = true;
// Scan through every token. Push numbers onto the stack, and for
// operators pop off the necessary number of items from the stack,
// evaluate them, and push the result.
for (Token tok : tokens) {
if (tok.type == Token_type::NUMBER) {
stack.push_back(tok.value);
} else if (tok.type == Token_type::UNARY_MINUS) {
// Unary operators take one input. Thus, if the stack has no
// numbers on it the operator won't have enough data. So in that
// case we immediately set an error flag and jump out of the loop.
if (stack.size() < 1) {
ok = false;
break; // jump out of loop
}
int a = pop(stack);
stack.push_back(-a);
} else {
// Binary operators take two inputs. Thus, if the stack has less
// than 2 numbers on it the operator won't have enough data. So in
// that case we immediately set an error flag and jump out of the
// loop.
if (stack.size() < 2) {
ok = false;
break; // jump out of loop
}
int a = pop(stack);
int b = pop(stack);
switch (tok.type) {
case Token_type::PLUS: stack.push_back(a + b); break;
case Token_type::BINARY_MINUS: stack.push_back(b - a); break;
case Token_type::TIMES: stack.push_back(a * b); break;
case Token_type::DIVIDE:
if (a == 0.0) {
return Int_result{0, "division by 0"};
}
stack.push_back(b / a);
break;
default:
// An unknown token type is a *really* bad error that
// means there's a mistake in the program's design.
cmpt::error("postfix_eval error: unknown token type '"
+ string(1, char(tok.type)) + "'");
} // switch
} // else
} // for
if (ok && stack.size() > 0) {
return Int_result{stack[0], ""};
} else {
return Int_result{0, "not enough numbers to pop"};
}
} // postfix_eval
Implementing a Postfix Evaluator¶
next let’s write a version of the postfix evaluator that takes a string as input
first we’ll need to run the string through the scanner:
Int_result postfix_eval(const string& expr) {
Scan_result tokens = scan(expr);
if (tokens.okay()) {
Int_result result = postfix_eval(tokens.value);
return result;
} else {
return Int_result{0, tokens.error_msg};
}
}
since result
is only used once, an alternate way to write this same
function is this:
Int_result postfix_eval(const string& expr) {
Scan_result tokens = scan(expr);
if (tokens.okay()) {
return postfix_eval(tokens.value);
} else {
return Int_result{0, tokens.error_msg};
}
}
one other interesting thing we could do here is replace Scan_result
with
auto
:
Int_result postfix_eval(const string& expr) {
auto tokens = scan(expr);
if (tokens.okay()) {
return postfix_eval(tokens.value);
} else {
return Int_result{0, tokens.error_msg};
}
}
auto
can see that the return type of scan(expr)
is Scan_result
,
and so tokens
is automatically declared to be of type Scan_result
it’s not much of savings here, but for long, complicated types auto
can be
convenient
A Postfix Command-line Interpreter¶
now that we can evaluate postfix strings, let’s write a simple interpreter to test it:
// Replace non-printing characters like '\n' with the 2-character string
// "\\n". Also, can replace spaces with '.'s (or some other char) to make them
// easier to see.
string raw(const string& s, const string& dot_char = ".") {
string result;
for (char c : s) {
switch (c) {
case '\n': result += "\\n"; break; // break is needed to stop
case '\t': result += "\\t"; break; // the flow of control
case '\r': result += "\\r"; break; // from "falling through"
case ' ' : result += dot_char; break; // to the next case
default: result += c;
} // switch
} // for
return result;
}
void repl_postfix() {
for(;;) { // loop forever
cout << "postfix> ";
// getline reads the entire line of input, including spaces;
// cin >> input would just read the characters before the first space
// (e.g. the first word)
string input;
getline(cin, input);
cout << "input=" << quote(raw(input)) << "\n";
Int_result result = postfix_eval(input);
if (result.okay()) {
cout << "val = " << result.value << "\n";
} else {
cout << "Error: " << result.error_msg << "\n";
}
} // for
}
we wrote this to help with debugging, so you should use it to test out a few expressions to convince yourself the postfix evaluator is correct
notice the raw
function: it makes whitespace characters visible, which
can be useful information in debugging
postfix> 3 4 - 4 +
input="3.4.-.4.+"
val = 3
postfix> 2 3 4 5 * * *
input="2.3.4.5.*.*.*"
val = 120
postfix> 1 2+3*
input="1.2+3*"
val = 9
postfix> 1 2 + 3
input="...1...2.....+..3..."
val = 3
postfix> 2 +
input="2.+"
Error: not enough numbers to pop
postfix> 2 3 x
input="2.3.x"
Error: scanner encountered unknown character 'x'
Infix to Postfix Conversion (shunting yard algorithm)¶
now that we have the eval_postfix
working, the next step is to convert
infix expressions into postfix expressions
we’ll use a well-known solution to this problem called the shunting yard algorithm
we won’t go too far into analyzing why the algorithm works
instead, we’ll treat it as given, and concentrate on implementing it nicely in C++
like the postfix evaluation algorithm the shunting yard algorithm uses a stack
Converting Infix to Postfix¶
Scan_result infix_to_postfix(const Sequence& input) {
Sequence output;
Sequence stack;
for(const Token& tok : input) {
switch (tok.type) {
case Token_type::NUMBER:
// numbers are always immediately pushed to output
output.push_back(tok);
break;
case Token_type::PLUS:
case Token_type::UNARY_MINUS:
case Token_type::BINARY_MINUS:
case Token_type::TIMES:
case Token_type::DIVIDE:
// pop higher-precedence operators off the stack
while (!stack.empty()
&& (is_op(stack.back().type)
&& precedence(stack.back().type) >= precedence(tok.type)))
{
output.push_back(pop(stack));
} // while
stack.push_back(tok);
break;
case Token_type::LEFT_PAREN:
// left parentheses are always immediately pushed onto the stack
stack.push_back(tok);
break;
case Token_type::RIGHT_PAREN:
// pop operators until the first left parenthesis is reached
// (if no parenthesis is found, then there's a mis-matched
// parenthesis error)
while (!stack.empty()
&& (stack.back().type != Token_type::LEFT_PAREN))
{
output.push_back(pop(stack));
} // while
if (stack.empty()) {
return Scan_result{Sequence{}, "mis-matched parenthesis"};
} else if (stack.back().type == Token_type::LEFT_PAREN) {
// discard the left parenthesis on the top
pop(stack);
} else {
cmpt::error("logic error in infix_to_postfix RIGHT_PAREN case!");
}
break;
default:
cout << "tok.type: '" << char(tok.type) << "'\n";
cmpt::error("reached default in infix_to_postfix!");
} // switch
} // for
while (!stack.empty()) {
Token t = pop(stack);
if (t.type == Token_type::LEFT_PAREN || t.type == Token_type::RIGHT_PAREN) {
return Scan_result{Sequence{}, "mis-matched parenthesis"};
}
output.push_back(t);
} // while
return Scan_result{output, ""};
} // infix_to_postfix
notice the for-loop:
for(const Token& tok : input) {
// ...
}
this uses a for-each loop to processe the input one token at a time
the variable tok
is a reference to the actual Token
in input
it’s also const
so that no code in the loop is allowed to modify it (code
can only read it)
Converting Infix to Postfix¶
now lets write some an a function to evaluate a sequence of tokens in infix format:
Int_result infix_eval(Sequence input) {
minus_fix(input);
Scan_result postfix = infix_to_postfix(input);
if (postfix.okay()) {
return postfix_eval(postfix.value);
} else {
return Int_result{0, postfix.error_msg};
}
}
notice the call to minus_fix
at the start
it makes sure that all the unary minuses have the right type (see the
minus_fix
function for the rules when this applies)
here’s a version that takes a string as input:
Int_result infix_eval(const string& input) {
Scan_result tokens = scan(input);
if (tokens.okay()) {
return infix_eval(tokens.value);
} else {
return Int_result{0, tokens.error_msg};
}
}
An Infix Interpreter¶
now we can write an infix interpreter
it’s essentially the same as the postfix interpreter we wrote above, but we
call infix_eval
instead
void repl_infix() {
for(;;) { // loop forever
cout << "infix> ";
string input;
// getline reads the entire line of input, including spaces;
// cin >> input would just read the characters before the first space
// (e.g. the first word)
getline(cin, input);
cout << "input=" << quote(raw(input)) << "\n";
Int_result result = infix_eval(input);
if (result.okay()) {
cout << "val = " << result.value << "\n";
} else {
cout << "Error: " << result.error_msg << "\n";
}
} // for
}
Automating the Testing¶
the purpose of the interpreter is to test that the infix expressions are correctly evaluated
you have to type the expressions by hand, which is useful
but it is too tedious to do a lot of testing by hand
so lets write some code that will do some simple automated testing
int test_count = 0;
void test(const string& expr, int expected_result) {
++test_count;
auto result = infix_eval(expr);
if (result.okay() && result.value == expected_result) {
// do nothing --- test passed
cout << test_count << ": " << quote(expr) << " passed\n";
} else {
cout << "!! Test failed: " << quote(expr)
<< "\n!! result=" << result
<< ", expected=" << expected_result << "\n";
cmpt::error("test failed");
}
}
void infix_eval_test() {
cout << "Testing infix_eval ...\n";
test_count = 0;
test("0", 0);
test("10", 10);
test(" 1 +2", 3);
test("4*2", 8);
test(" 10/ 5", 2);
test("1+2", 3);
test("(1)+2", 3);
test("1+(2)", 3);
test("(1+2)", 3);
test("286", 286);
test("(286)", 286);
test("((286))", 286);
test("(((286)))", 286);
test("1+2+3 + (4 + 5) + 7", 22);
test("2 + 3 * 4", 14);
test("(2 + 3) * 4", 20);
test("2 * 3 + 4", 10);
test("(10+20) / 15 ", 2);
test("6 * 5 - (1 * 2 + 3 * 4 )", 16);
test("32 / 2 ", 16);
test("32 / 2 / 2", 8);
test("(8 - 6 / 2) / (8-6/2)", 1);
test("(1 - 1 + 4) * 2", 8);
test("(155 + 2 + 3 - 155 - 2 - 3 + 4) * 2", 8);
test("32 / 2 * 2", 32);
test("1-2 ", -1);
test("-(1 + 2)", -3);
test("-5", -5);
test("1 + -3", -2);
test("-1 + 3", 2);
test("-(1 + 2)", -3);
test("3 - -2", 5);
test("-2 * 6", -12);
test("-(-2 * -6)", -12);
cout << "\n... all infix_eval tests passed!\n";
}
Automating the Testing¶
choosing test cases can be tricky
what should they be?
how many is enough?
here we want to make sure that our tests cases use all the tokens, and have different amounts of whitespace
also, we know from the implementation that how -
is handled is more
complex than the other operators, so we include some extra cases for that
we’ll discuss more about testing theory a little later in the course
Comments on the Calculator¶
we’ve implemented an integer calculator using postfix evaluation and the shunting yard algorithm
that’s certainly not the only way to implement such a calculator!
another popular approach is a technique called recursive descent, where a collection of related recursive functions handle the parsing
one of the issues with our calculator is that the error messages are not always so good
for instance, sometimes you can get an error message from the scanner, or from the conversion algorithm, or from the postfix evaluator
it can be quite confusing!
it turns out that getting good error messages with this kind of implementation is tricky
in contrast, it’s usually easier to get good error messages using recursive descent parsers
Complete Calculator Source Code¶
// calculator-notes.cpp
//
// This program implements a basic integer calculator. It consists of three
// main parts:
//
// - A lexical scanner that converts a string, like "(1 + 2)*3", into a
// sequence of tokens such as {{"(", OPEN_PAREN, 0}, {"1", NUM, 1}, {"+",
// PLUS, 0}, {"2", NUM, 2}, {")", CLOSE_PAREN, 0}, {"*", TIMES, 0}, {"3",
// NUM, 3}}.
//
// - A postfix evaluator that evaluates a sequence of tokens in postfix
// format.
//
// - A transformer that converts a sequence of tokens in infix format into an
// equivalent sequence of tokens in postfix format. This uses the Shunting
// Yard algorithm, a non-recursive "precedence climbing" parsing algorithm
// designed specifically for parsing arithmetic expressions.
//
// When implementing this calculator, it seems best to write the scanner
// first, followed by the postfix evaluator, and then, finally, infix-to-
// postfix transformer.
//
// The main purpose of this program is to demonstrate some basic features of
// C++. As a calculator, it's just a toy. But it could be the starting point
// for a more robust and fully-featured calculator.
//
#include "cmpt_error.h"
#include <iostream>
#include <string>
#include <vector>
using namespace std;
//////////////////////////////////////////////////////////////////////////
//
// Helper functions
//
//////////////////////////////////////////////////////////////////////////
// Returns true if, and only if, c is a whitespace character.
bool is_whitespace(char c) {
return c == ' ' || c == '\n' || c == '\r' || c == '\t';
}
// Returns true if, and only if, c is a decimal digit.
bool is_digit(char c) {
return '0' <= c && c <= '9';
}
// Returns a copy of s in double-quotes.
string quote(const string& s) {
return "\"" + s + "\"";
}
//////////////////////////////////////////////////////////////////////////
//
// Token_type enumeration
//
//////////////////////////////////////////////////////////////////////////
// ": char" causes each value of this enum to be represented as a char.
enum class Token_type : char {
LEFT_PAREN = '(',
RIGHT_PAREN = ')',
PLUS = '+',
BINARY_MINUS = 'm',
UNARY_MINUS = 'u',
TIMES = '*',
DIVIDE = '/',
NUMBER = 'n' // n for "number"
};
// Overload operator<< so that we can easily print Token_type values.
ostream& operator<<(ostream& os, const Token_type& tt) {
os << "'" << char(tt) << "'";
return os;
}
// Returns true if tt is an operator, and false otherwise.
bool is_op(Token_type tt) {
switch (tt) {
case Token_type::TIMES: return true;
case Token_type::DIVIDE: return true;
case Token_type::PLUS: return true;
case Token_type::BINARY_MINUS: return true;
case Token_type::UNARY_MINUS: return true;
default: return false;
} // switch
}
// Returns the precedence of an operator (needed for parsing).
// * has higher precedence than +, so the expression
// "1 + 2 * 3" is evaluated as "1 + (2 * 3)"
int precedence(Token_type tt) {
switch (tt) {
case Token_type::UNARY_MINUS: return 4;
case Token_type::TIMES: return 3;
case Token_type::DIVIDE: return 3;
case Token_type::PLUS: return 2;
case Token_type::BINARY_MINUS: return 2;
default: cmpt::error("precedence default case reached");
} // switch
return -1; // can never be reached!
}
//////////////////////////////////////////////////////////////////////////
//
// Tokens
//
//////////////////////////////////////////////////////////////////////////
// A Token consists of a type, and, for numbers, the value of the number.
struct Token {
Token_type type;
int value; // only used when type == Token_type::NUMBER
};
// Overload operator<< so that we can easily print a single Token.
ostream& operator<<(ostream& os, const Token& t) {
if (t.type == Token_type::NUMBER) {
os << "<" << t.value << ">";
} else {
os << "<" << char(t.type) << ">";
}
return os;
}
// Sequence is a type synonym for vector<Token>, i.e. Sequence is another name
// for the type vector<Token>.
typedef vector<Token> Sequence;
// Print a vector of Tokens in a nice format.
ostream& operator<<(ostream& os, const Sequence& tokens) {
int n = tokens.size();
if (n == 0) {
os << "{}";
} else if (n == 1) {
os << "{" << tokens[0] << "}";
} else {
os << "{" << tokens[0];
for(int i = 1; i < tokens.size(); ++i) {
os << ", " << tokens[i];
}
os << "}";
}
return os;
}
//////////////////////////////////////////////////////////////////////////
//
// Scanning functions
//
//////////////////////////////////////////////////////////////////////////
// Replace non-printing characters like '\n' with the 2-character string
// "\\n". Also, can replace spaces with '.'s (or some other char) to make them
// easier to see.
string raw(const string& s, const string& dot_char = ".") {
string result;
for (char c : s) {
switch (c) {
case '\n': result += "\\n"; break; // break is needed to stop
case '\t': result += "\\t"; break; // the flow of control
case '\r': result += "\\r"; break; // from "falling through"
case ' ' : result += dot_char; break; // to the next case
default: result += c;
} // switch
} // for
return result;
}
// The result of a scan is either a Sequence object, or an error. If the value
// is a Sequence, then okay() returns true; if it's an error, then okay()
// returns false.
struct Scan_result {
Sequence value;
string error_msg;
bool okay() { return error_msg.empty(); }
}; // struct Scan_result
ostream& operator<<(ostream& os, const Scan_result& sr) {
os << "Scan_result{" << sr.value << ", " << quote(sr.error_msg) << "}";
return os;
}
// Convert s into a vector of tokens. Calls cmpt::error on unknown characters.
// Throws std::out_of_range if an out-of-range int is encountered.
// A '-' is always treat as a Token_type::BINARY_MINUS (unary minuses will
// be fixed later in the process)
Scan_result scan(const string& s) {
Sequence result;
for(int i = 0; i < s.size(); i++) {
if (is_whitespace(s[i])) {
// skip all whitespace
i++;
while (i < s.size() && is_whitespace(s[i])) i++;
// invariant: i >= s.size() || !is_whitespace(s[i])
// If we're not at the end of the string, then decrement i because
// we had to look one character ahead to see that it wasn't
// whitespace.
if (i < s.size()) i--;
} else if (is_digit(s[i])) {
string num = string(1, s[i]);
i++;
while (i < s.size() && is_digit(s[i])) {
num += string(1, s[i]);
i++;
}
// invariant: i >= s.size() || !is_digit(s[i])
// If we're not at the end of the string, then decrement i because
// we had to look one character ahead to see that it wasn't a
// digit.
if (i < s.size()) i--;
result.push_back(Token{Token_type::NUMBER, stoi(num)});
} else if (s[i] == '(') {
result.push_back(Token{Token_type::LEFT_PAREN, 0});
} else if (s[i] == ')') {
result.push_back(Token{Token_type::RIGHT_PAREN, 0});
} else if (s[i] == '+') {
result.push_back(Token{Token_type::PLUS, 0});
} else if (s[i] == '-') {
result.push_back(Token{Token_type::BINARY_MINUS, 0});
} else if (s[i] == '*') {
result.push_back(Token{Token_type::TIMES, 0});
} else if (s[i] == '/') {
result.push_back(Token{Token_type::DIVIDE, 0});
} else {
string msg = "scanner encountered unknown character '" + string(1, s[i]) + "'";
return Scan_result{Sequence{}, msg};
}
} //for
return Scan_result{result, ""};
} // scan
// Distinguishing between unary - (as in -5) and binary - (as in 1 - 2).
// Examples of unary -:
//
// -5
// 1 + -3
// -1 + 3
// -(1 + 2)
// 3 - -2
// (-2 * 6)
// -1--2
//
// A - is unary if:
//
// * it is the first token of an expression, e.g. -5, -(1 + 2)
// * the previous token is an operator or a (, e.g. 1 + -3, (-2 * 6)
//
// All other instances of - are assumed to be binary.
//
void minus_fix(Sequence& seq) {
// a - at the start of a sequence is always considered unary
if (seq[0].type == Token_type::BINARY_MINUS) {
seq[0].type = Token_type::UNARY_MINUS;
}
for(int i = 1; i < seq.size(); ++i) {
Token_type prev = seq[i - 1].type;
if (seq[i].type == Token_type::BINARY_MINUS) {
if (prev == Token_type::LEFT_PAREN || is_op(prev)) {
seq[i].type = Token_type::UNARY_MINUS;
}
}
} // for
}
//////////////////////////////////////////////////////////////////////////
//
// Postfix evaluator
//
//////////////////////////////////////////////////////////////////////////
// Removes an item from the end of a vector and returns it. It is a template
// function, which means it works a vector<T>, where T is any type.
template<class T>
T pop(vector<T>& stack) {
T result = stack.back();
stack.pop_back();
return result;
}
// The result of a scan is either a Sequence object, or an error. If the value
// is a Sequence, then okay() returns true; if it's an error, then okay()
// returns false.
struct Int_result {
int value;
string error_msg;
bool okay() { return error_msg.empty(); }
}; // struct Int_result
ostream& operator<<(ostream& os, const Int_result& ir) {
os << "Int_result{" << ir.value << ", " << quote(ir.error_msg) << "}";
return os;
}
// Assumes tokens form a valid postfix expression.
Int_result postfix_eval(const Sequence& tokens) {
vector<int> stack;
bool ok = true; // error flag that is set to false if there are not
// enough numbers on the stack to evaluate an operator
// Scan through every token. Push numbers onto the stack. For operators,
// pop off the necessary number of items from the stack, evaluate them,
// and push the result.
for (Token tok : tokens) {
if (tok.type == Token_type::NUMBER) {
stack.push_back(tok.value);
} else if (tok.type == Token_type::UNARY_MINUS) {
// Unary operators take one input. Thus, if the stack has no
// numbers on it we immediately set an error flag and jump out of
// the loop.
if (stack.size() < 1) {
ok = false;
break; // jump out of loop
}
int a = pop(stack);
stack.push_back(-a);
} else {
// Binary operators take two inputs. Thus, if the stack has less
// than 2 numbers on it we immediately set an error flag and jump
// out of the loop.
if (stack.size() < 2) {
ok = false;
break; // jump out of loop
}
int a = pop(stack);
int b = pop(stack);
switch (tok.type) {
case Token_type::PLUS: stack.push_back(a + b); break;
case Token_type::BINARY_MINUS: stack.push_back(b - a); break;
case Token_type::TIMES: stack.push_back(a * b); break;
case Token_type::DIVIDE:
if (a == 0.0) {
return Int_result{0, "division by 0"};
}
stack.push_back(b / a);
break;
default:
// An unknown token type is a *really* bad error that
// means there's a mistake in the program's design.
cmpt::error("postfix_eval error: unknown token type '"
+ string(1, char(tok.type)) + "'");
} // switch
} // else
} // for
if (ok && stack.size() > 0) {
return Int_result{stack[0], ""};
} else {
return Int_result{0, "not enough numbers to pop"};
}
} // postfix_eval
Int_result postfix_eval(const string& expr) {
Scan_result tokens = scan(expr);
if (tokens.okay()) {
Int_result result = postfix_eval(tokens.value);
return result;
} else {
return Int_result{0, tokens.error_msg};
}
}
//////////////////////////////////////////////////////////////////////////
//
// Infix evaluator
//
//////////////////////////////////////////////////////////////////////////
//
// Use the shunting-yard algorithm to convert infix to postfix.
//
// Assumes tokens vector forms a infix expression.
// See: https://en.wikipedia.org/wiki/Shunting-yard_algorithm
//
// Some invalid expressions, like "(1 + 2", are caught in this function, while
// others, like "1(+)2" or "1 + + 2", are caught only in the postfix
// evaluation. That's pretty confusing for the user! Consistent, helpful error
// messages would be a good improvement to this program.
Scan_result infix_to_postfix(const Sequence& input) {
Sequence output;
Sequence stack;
for(const Token& tok : input) {
switch (tok.type) {
case Token_type::NUMBER:
// numbers are always immediately pushed to output
output.push_back(tok);
break;
case Token_type::PLUS:
case Token_type::UNARY_MINUS:
case Token_type::BINARY_MINUS:
case Token_type::TIMES:
case Token_type::DIVIDE:
// pop higher-precedence operators off the stack
while (!stack.empty()
&& (is_op(stack.back().type)
&& precedence(stack.back().type) >= precedence(tok.type)))
{
output.push_back(pop(stack));
} // while
stack.push_back(tok);
break;
case Token_type::LEFT_PAREN:
// left parentheses are always immediately pushed onto the stack
stack.push_back(tok);
break;
case Token_type::RIGHT_PAREN:
// pop operators until the first left parenthesis is reached
// (if no parenthesis is found, then there's a mis-matched
// parenthesis error)
while (!stack.empty()
&& (stack.back().type != Token_type::LEFT_PAREN))
{
output.push_back(pop(stack));
} // while
if (stack.empty()) {
return Scan_result{Sequence{}, "mis-matched parenthesis"};
} else if (stack.back().type == Token_type::LEFT_PAREN) {
// discard the left parenthesis on the top
pop(stack);
} else {
// If the program ever gets to this line then there is a
// serious problem with it's design.
cmpt::error("logic error in infix_to_postfix RIGHT_PAREN case!");
}
break;
default:
cout << "tok.type: '" << char(tok.type) << "'\n";
cmpt::error("reached default in infix_to_postfix!");
} // switch
} // for
// The output is all processed, but there might be tokens on the stack. So
// move them all to output.
while (!stack.empty()) {
Token t = pop(stack);
if (t.type == Token_type::LEFT_PAREN || t.type == Token_type::RIGHT_PAREN) {
return Scan_result{Sequence{}, "mis-matched parenthesis"};
}
output.push_back(t);
} // while
return Scan_result{output, ""};
} // infix_to_postfix
Int_result infix_eval(Sequence input) {
minus_fix(input);
Scan_result postfix = infix_to_postfix(input);
if (postfix.okay()) {
return postfix_eval(postfix.value);
} else {
return Int_result{0, postfix.error_msg};
}
}
Int_result infix_eval(const string& input) {
Scan_result tokens = scan(input);
if (tokens.okay()) {
return infix_eval(tokens.value);
} else {
return Int_result{0, tokens.error_msg};
}
}
//////////////////////////////////////////////////////////////////////////
//
// Testing functions
//
//////////////////////////////////////////////////////////////////////////
void repl_postfix() {
for(;;) { // loop forever
cout << "postfix> ";
// getline reads the entire line of input, including spaces;
// cin >> input would just read the characters before the first space
// (e.g. the first word)
string input;
getline(cin, input);
cout << "input=" << quote(raw(input)) << "\n";
auto result = postfix_eval(input);
if (result.okay()) {
cout << "val = " << result.value << "\n";
} else {
cout << "Error: " << result.error_msg << "\n";
}
} // for
}
void repl_infix() {
for(;;) { // loop forever
cout << "\ninfix> ";
string input;
// getline reads the entire line of input, including spaces;
// cin >> input would just read the characters before the first space
// (e.g. the first word)
getline(cin, input);
cout << "input=" << quote(raw(input)) << "\n";
Int_result result = infix_eval(input);
if (result.okay()) {
cout << "val = " << result.value << "\n";
} else {
cout << "Error: " << result.error_msg << "\n";
}
} // for
}
//////////////////////////////////////////////////////////////////////////
//
// Testing
//
//////////////////////////////////////////////////////////////////////////
int test_count = 0;
void test(const string& expr, int expected_result) {
++test_count;
Int_result result = infix_eval(expr);
if (result.okay() && result.value == expected_result) {
// do nothing --- test passed
cout << test_count << ": " << quote(expr) << " passed\n";
} else {
cout << "!! Test failed: " << quote(expr)
<< "\n!! result=" << result
<< ", expected=" << expected_result << "\n";
cmpt::error("test failed");
}
}
void infix_eval_test() {
cout << "Testing infix_eval ...\n";
test_count = 0;
test("0", 0);
test("10", 10);
test(" 1 +2", 3);
test("4*2", 8);
test(" 10/ 5", 2);
test("1+2", 3);
test("(1)+2", 3);
test("1+(2)", 3);
test("(1+2)", 3);
test("286", 286);
test("(286)", 286);
test("((286))", 286);
test("(((286)))", 286);
test("1+2+3 + (4 + 5) + 7", 22);
test("2 + 3 * 4", 14);
test("(2 + 3) * 4", 20);
test("2 * 3 + 4", 10);
test("(10+20) / 15 ", 2);
test("6 * 5 - (1 * 2 + 3 * 4 )", 16);
test("32 / 2 ", 16);
test("32 / 2 / 2", 8);
test("(8 - 6 / 2) / (8-6/2)", 1);
test("(1 - 1 + 4) * 2", 8);
test("(155 + 2 + 3 - 155 - 2 - 3 + 4) * 2", 8);
test("32 / 2 * 2", 32);
test("1-2 ", -1);
test("-(1 + 2)", -3);
test("-5", -5);
test("1 + -3", -2);
test("-1 + 3", 2);
test("-(1 + 2)", -3);
test("3 - -2", 5);
test("-2 * 6", -12);
test("-(-2 * -6)", -12);
cout << "\n... all infix_eval tests passed!\n";
}
//////////////////////////////////////////////////////////////////////////
//
// Main program
//
//////////////////////////////////////////////////////////////////////////
int main() {
infix_eval_test();
// repl_postfix();
repl_infix();
} // main