Software Design¶
Based on:
- Chp 7 of Sommerville “Software Engineering” (9th edition)
- A Philosophy of Software by John Ousterhout
Software Design¶
designing software is somewhat of a black art
the key problem is this: how should you decompose big problems into smaller ones?
there is no magic recipe!
at best, there is advice, examples, and principles to follow
the main thing seems to be: practice!
- the more you try designing, the better you will get
- through practice, you learn what works and what doesn’t
about the only thing good software designers seem to have in common is that they have lots of practice writing and designing programs
here we will discuss some of the important and general principles of software design
A Note on Modules¶
when discussing software design, we will often refer to modules
in our usage, a module is some unit of interest in a program, such as is:
- a function
- a class
- a collection of related functions and variables, e.g. a package
Cohesion¶
cohesion refers to the degree in which the elements in a software module (function, class, package, etc.) belong together
a module has high cohesion if the functions/methods/data within it go together well, and low cohesion if they don’t
a module with low cohesion can be a sign that it should be split up into multiple smaller modules
highly cohesive modules tend to be better because they tend to be less complex
- the functions/variables of a highly cohesive module go together sensibly
kinds of cohesion:
coincidental cohesion: parts of a module are arbitrarily put together
- this is clearly bad!
- e.g. the “util” module of a system might end up being a catch-all where
functions/variables that don’t have any other obvious home are placed
- which may better than having them scattered throughout the system within modules that they don’t seem to be related to
logical cohesion: functions/variables grouped together because they are about the same thing
- e.g. text processing functions might all be placed together in the same module
temporal cohesion: functions/variables grouped together by the time they run in a program, e.g. a module for “beginning stuff”, a module for “middle stuff”, and a module for “end stuff”
- in some testing frameworks, there is the notion of setup and tear-down
- before you do the testing, you need to set things up
- e.g. you might need to initialize a special folder where results will be placed
- after testing is finished, you need to remove the things you set up,
along with any “mess” made by the testing itself
- e.g. testing might have created dummy files that should be deleted
- should all setup functions be grouped into a single module? should all tear-down functions be grouped together?
- before you do the testing, you need to set things up
- in C++, code inside a constructor is executed when an object is created,
and code inside a destructor is executed when the object is de-allocated
- however, C++ does not group all constructors into one module, and all destructors into a another module
- instead, the constructors/destructors are grouped together in the same class
- in some testing frameworks, there is the notion of setup and tear-down
communicational/informational cohesion: functions grouped together because they operate on the same data
- e.g. the C++ dynamic vector<T> class groups together certain basic
functions that work on vectors
- the vector<T> constructor/destructor are grouped together because the constructor allocates the memory, and the destructor de-allocates it
- e.g. put all functions that read/write/process JSON into a single module
- e.g. the C++ dynamic vector<T> class groups together certain basic
functions that work on vectors
sequential cohesion: functions/variables grouped together because they follow a sequence of operations, typically where the output of one operation is the input to another
e.g. in a Linux terminal you often “pipe” commands together in sequence:
$ sort names | uniq > sort_names
the output of
sort
is passed as the input touniq
, which is then written to the filesort_names
- this shows sequential cohesion, so if you did this operation a lot it might make sense to make it into its own function
e.g. in a parser you might have sequential code like this:
s = read_file("infile.txt") raw_words = split(s) words = trim(raw_words) tokens = tokenize(words)
this tokenizes a text file; each line uses the output of the previous line as its input; you could write in one line like this:
tokenize(trim(split(read_file("infile.txt"))))
this is shorter, and the sequencing is perhaps more clear, but note that this expression must be read from right to left, e.g.
tokenize
is the first operation mentioned, but the last operation donethe BASH scripting language provides a
|
operator that lets you write operations in the order they run, e.g. something like this:read_file("infile.txt") | split | trim | tokenize
since these operations are all connected one after the other, they are considered to be sequentially cohesive
functional cohesion: functions/variables grouped together because they contribute to a single, clearly-defined task
- e.g. a module that houses a JSON parser (that reads textual JSON and converts it into a syntax tree) would be a good example of a functionally cohesive module
Coupling¶
coupling refers to how closely related two modules are
generally, loose coupling, where modules are not closely related, is considered to be a good thing
tight coupling, or high coupling is general considered to be bad, since two modules that are tightly coupled together might not be able to work without the other, thus making them less flexible
- a change to one module in a tightly coupled system could have a significant impact on other modules, and this increases system complexity
- if you want to re-use a single module in a different system, then you may be forced to use any modules tightly coupled to it as well
if two modules are very tightly coupled, that can be a sign that they would be better off combined into a single module
cohesion and coupling are often considered to be contrasts of each other, e.g. two loosely coupled modules are likely to be, on their own, highly cohesive
good modular systems have modules that are highly cohesive and loosely coupled
kinds of coupling:
- common coupling, is when two modules share, say, global variables
- this can be problematic if both modules can write to the global variable, as then they must synchronize their access to it
- less problematic is if the modules only have read access to the globals, e.g. global constants are okay because they never change and so there is no need to synchronize access to them
- external coupling is when two modules share, say, the same data format or communication protocol
- subclass coupling is when two classes (in an object-oriented system) are
related by inheritance
- e.g. if an
SFU_student
class inherits from aStudent
class, then there is subclass coupling between- if
Student
changes, then this might impactSFU_student
- if you want to re-use
SFU_student
in another system, you have to take theStudent
class with it
- if
- e.g. if an
another way of thinking about coupling is by thinking about a modules dependencies
for instance, a simple rule of thumb for estimating a modules dependencies is to count how many #includes/imports it has
- also, if those imports are part of the standard library (and so will be easily available), or if they are from outside the standard library (and so may be less easily available)
for example, consider this C++ function:
void say_hi(const string& name) { // C++ cout << "Hi " << name << "!\n"; }
this function (functions are modules!) depends upon
string
andiostream
you couldn’t easily re-use this to, say, write the same message into a file, or into a network socket
this version doesn’t depend upon
iostream
:string say_hi(const string& name) { return "Hi " + name + "!\n"; }
- one could also get rid of the dependency on
string
by using C-style strings; but C-style strings are notoriously complex and error-prone, and so that might be going too far
another practical detail here is that a newline (
\n
) is included at the end of the string, and so there is no way to put anything immediately after this on the same lineso you could argue that this function is not as cohesive as it could be: not only does it say hello, it also decides to move the cursor to the next line
but moving the cursor to the next line isn’t a decision that can be reasonably made by a function that says hello, because that function doesn’t know anything about what’s coming next
better would be to separate-out the decision to go to the next line, and just have the function return its string:
string say_hi(const string& name) { return "Hi " + name + "!"; // no \n after the ! }
now this function just says hello: it is more cohesive, doing only one thing (saying hello)
- one could also get rid of the dependency on
Contracts¶
a problem with some software is that it is too vague, or obscure
precision is important!
one way to write precise software is to have clear contracts of behaviour between different modules
for example, pre/post conditions for a module can be a good way to specify their expected behaviour
- pre-condition: what must be true before the function is called
- post-condition: what the function promises will be true after it is called, assuming the pre-condition was true
together, a pre-condition and post-condition form a contract between the function and the rest of the program
contracts can help you narrow-down the potential locations of a bug
- a pre-condition failure tells you that there is likely a bug outside the function before it’s called
- a post-condition failure tells you there is likely a bug somewhere inside the function
here’s an example of a subtly wrong contract that uses pre and post conditions:
// Pre:
// lst has no duplicates
// Post:
// lst[0] < lst[1] < ... < lst[n-1]
void sort(list& lst)
do you see the problem?
there is nothing in the contract that forbids modifying lst
in a way that
changes its values
so, according to the contract, this would bea valid implementation of sort
:
// Pre:
// lst has no duplicates
// Post:
// permutes the elements of lst so that
// lst[0] < lst[1] < ... < lst[n-1]
void sort(list& lst) {
for(int i = 0; i < lst.size(); i++) lst[0] = i;
}
clearly, this is not what is meant by sorting, but the contract lets this implementation slip-through
a correct version of the contract must somehow indicate that solutions like that are not allowed:
// Pre:
// lst has no duplicates
// Post:
// permutes the elements of lst so that
// lst[0] < lst[1] < ... < lst[n-1]
void sort(list& lst)
here, the term “permutes” is precise, and means what it means in mathematics
i.e. lst is the list before sort is called, and lst’ is the list afterwards:
forall x in lst: x in lst’, and for all x in lst’: x in lst
the term “permutes” implies that the only difference between the input list and the output list is the order of the elements
so sort
is allowed to modify lst
if it needs to, as long as it
properly “cleans up” its changes by the time it returns
- note that
lst
was passed by reference, allowing for this possibility
Contracts¶
pre-conditions and post-conditions are not the only part of useful module contracts
other important “contract” information often includes:
- performance information
- for example, list operations such as insert_front(x) and insert_back(x)
should say how efficient they are
- for simply-linked lists, inserting at the front is O(1), and inserting at the back is O(n)
- for dynamic arrays, the performance is often the opposite: inserting at the front is O(n), and inserting at the back is O(1)
- quicksort is a famous algorithm where the average-case performance is
good, but the worst-case performance is bad
- in practice, the worst-case input to quicksort seems to be rare, and so, in practice, quicksort tends to perform very efficiently
- but there is always the chance quicksort could get unlucky and get an input that causes it to go slow
- for some applications, even a small risk of a super-slow sort could be too dangerous, and so other algorithms might be needed (e.g. introsort, mergesort)
- for example, list operations such as insert_front(x) and insert_back(x)
should say how efficient they are
- memory-usage information
- e.g. both quicksort and mergesort run, on average, in O(n log n) time, but mergesort requires O(n) extra memory to efficiently do the merging; in contrast, quicksort usually requires much less extra memory since it sorts entirely in-place (quicksort needs some extra memory to store its recursive calls)
- notes about any special cases, known bugs, or unusual behaviour
- e.g. in sorting, it’s useful to know if a sort is stable
- a sort is stable if it guarantees to keep items with the same key in the
same relative order they appeared in the input
- this is a very useful property in practice!
- for example, you could sort a list of students in a class alphabetically by name and then by letter grade
- if you do this with a stable sort, then after you sort by letter grade the names of the students with the same letter grade will be in alphabetical order
Deep Classes¶
in his book A Philosophy of Software, John Ousterhout discusses what he calls deep classes
- we will take “class” to mean the same thing as “module”
a deep class is a class with a lot of code/functionality, but with a small interface:
A Deep Class
--- <--- small interface to lots of code
| |
| |
| |
| |
| |
| |
---
in contrast, a shallow class has a big interface with very little code:
A Shallow Class
--------------------------- <--- big interface to a small amount of code
| |
---------------------------
in general, deep classes are preferable to shallow classes
rule of thumb: de-compose a system into a set of cohesive, loosely coupled, deep classes
Deep Classes: Unix I/O¶
Ousterhout’s favourite example of a deep class is the Unix file I/O interface, which consists of the following functions:
int open(const char* path, int flags, mode_t permissions);
ssize_t read(int fd, void* buffer, size_t count);
ssize_t write(int fd, const void* buffer, size_t count);
off_t lseek(int fd, off_t offset, int referencePosition);
int close(int fd);
a modern implementation of Unix can require tens of thousands of lines of code to fully implement file I/O (!), yet this basic 5-function interface is all that most programmers need to use it
- to be fair, the interactions of the modes and flags is actually quite complex, and careful formal specifications of this interface are surprisingly large
- but from the point of view of the average Unix programmer, these 5 functions are all they need to know for most of their work
the implementations of Unix file I/O have changed greatly over the years, but this original basic interface has not changed
- programs that rely on this interface thus don’t need to change, and yet can get the benefits of improved I/O implementations
Deep Classes: Garbage Collection¶
another nice example of a deep class is the garbage collection found in programming languages like Java, Go, Python, etc.
a garbage collector is a program that automatically deletes memory that is no longer in use
in most programming language, the garbage collection is totally automatic with no intervention on behalf of the programmer
thus, the garbage collector is essentially has no interface
so there is no way to use it incorrectly!
- although you certainly could write code in a way that puts more stress on the garbage collector as compared to other ways
many beginning programmers who start with a garbage-collected language are totally unaware of it
Deep Classes¶
in general, deep classes are a good thing that you ought to strive for your in your designs
but in practice, achieving them can be quite difficult!
it’s not always obvious that a simple interface is possible, or what it should be
- one problem with a simple interface is that complicated things must then be built up from a combination of smaller “primitive” things
- but sometimes it might not be clear what primitives are needed to ensure you can actually do all the complicated things
- and even if you can do all the complicated things, you might not be able to
do them as efficiently as you need them to be done
- e.g. you can empty a stack with n elements by calling pop() n times, but that is likely less efficient than a special-purpose clear() function that, say, sets an internal pointer to 0
one useful rule of thumb when designing interfaces: make the common case simple
- the Unix I/O interface above does this: the read and write functions assume sequential reading/writing, which is the most common way of processing files (but lseek is there in case you need random access)
Information Hiding¶
well-designed deep classes should encapsulate design decisions
the particular details of how the class works should generally not be part of the interface, i.e. these details should be hidden inside the class
thus, in a well-designed class it should be possible to change the implementation without changing the interface
- and as long as the interface doesn’t change, code that calls that interface doesn’t need to change
for example, suppose you have a class that parses JSON files
- there are many different ways to write a parser
- lots of little details impact the performance and memory-usage of a parser, and getting those details right is a lot of work
- the precise details of the parsing method don’t usually matter to programmers — they just want correct parse trees as soon as possible
- so all the details of parsing could be hidden in a class with a simple interface that, say, takes a JSON file as input, and returns the corresponding parse tree
in general, information hiding is generally one of the key ideas in making deep classes
Information Hiding: Information Leakage¶
information leakage is when the same information is used in two different places in your program
- e.g. two different classes both understand a particular file type
- this is usually a problem!!
- should probably combine the two classes into a single class
- in layered architectures, pass-through functions can be sources of information leakage
- another example is a standard API function that has some undocumented
behaviour
- programmers using an API may come to rely on undocumented behavior
- but the implementers of the API might change the undocumented behavior in later versions
- programs that relied on the old undocumented behavior now have bugs!
- you could treat this reliance on undocumented behaviour as a kind of information leakage
- the undocumented behavior is typically a consequence of the particular
implementation of the API
- i.e. implementation details have leaked out to programmers, causing them to rely on those details
Information Hiding: Temporal Decomposition¶
according to Ousterhout, a common software design error is to structure a program according to its temporal aspects
i.e. the problem being solved is decomposed in a way that follows the time in which operations are done
this is usually bad design!
for example, suppose your program reads a file, process the information in it, and then re-writes the file
you might consider writing three class to handle it: a reader class, a processing class, and rewriting class
but that is likely a mistake: there is information leakage!
- the reading and rewriting steps both require knowledge of the file format
better would be to combine this into a single object that can manage this file format info in one place
Dealing with Errors¶
error-handling is one of the most important aspects of any real-world system
yet it is a topic that is rarely covered in detail
- it’s unglamorous
- tedious!
- errors are often not thought to be the common case
but any good software design must handle errors that could occur
there is debate about how best to handle errors
- some languages use exceptions, e.g. C++, Java, JavaScript
- others use error flags, e.g. C, Go
both of these methods have their pros and cons, and what’s best can depend upon the particular situation
but however you do it, error handling and reporting adds a lot of complexity to your programs
Dealing with Errors: GUI Example¶
another interesting approach is suggested by Ousterhout: define errors out of existence
for example, imagine a graphical user interface that asks the user to enter a number from 1 to 10
- one way to do that would be to have the user type their number in a text box
- but the user might not type a valid number, and so the program needs to
validate the users input
- requires extra code and checking!
- user can get annoyed by the making mistakes
- even if the mistakes are entirely their fault, it can be frustrating to use an interface that allows them to make such errors
- another way to get the number would be to provide a drop-down menu that contains just the number from 1 to 10 (or give a list of buttons labelled from 1 to 10)
- in this second approach, there is no way for the user to enter invalid data:
data entry errors have been defined out of existence
- well, of course, the user could still accidentally choose the wrong number, so it doesn’t solve all problems — but has fewer errors to deal with than the text-box approach
Dealing with Errors: Deleting an Open File in Unix and Windows¶
consider the problem of deleting an open file
- e.g. perhaps you are editing a file in a text editor, and then, while still editing it, throw it into the trash
should deleting an open file be an error?
Windows (originally, at least) said yes, it’s an error: you cannot delete an open file
- which make sense: an open file is a file that is being used, and so it could cause problems if it was suddenly deleted
but practically, what often happened is that a file would not be correctly closed for some reason (e.g. the editing program crashes and didn’t close the file), and then you can’t delete a file
- so you may be forced to search through programs to see which one is keeping the file open
- or maybe you just start killing processes in the hopes of killing the one that is keeping the file open
- or maybe you just reboot your computer!
Unix took a very different approach to deleting an open file:
- if you delete an open file, then the file is not deleted immediately
- instead, the file is marked for deletion, and the file name is removed
from the directory
- the delete command returns without an error
- no exceptions or errors are raised for the process that is currently using the file
- by removing the file from the directory, no other file can open the deleted file, but a new file with the same name can be created (e.g. by the process currently using it!)
- once the file is closed by all the processes using it, then it is actually deleted
the Unix approach turns out to be much better in practice
- which might not seem obvious from this description!
- the Unix approach is more elaborate, opening up the possibility that some strange interaction of processes and file deletions could cause strange problems
- but experience shows it seems to be okay, and is generally considered to be an elegant way to handle the problem of deleting an open file
Dealing with Errors: Out of Range Slicing¶
Ousterhout: the Java substring functions throw an exception if you ask for a slice that is out of the string’s range, e.g.
"hamburger".substring(4, 8)
returns"urge"
"hamburger".substring(4, 18)
throwsIndexOutOfBoundsException
in practice, this means that you need a try/catch block to catch this error,
precede each call to substring
with code that checks that the indices are
valid
but another approach is to re-define substring function to return all string characters that overlap the range, so that out of range indices no longer cause an error; for example, that’s how Python does slicing:
>>> 'hamburger'[4:8] # Python
'urge'
>>> 'hamburger'[4:18]
'urger'
>>> 'hamburger'[14:18]
''
this does change the meaning of the substring function, but it does it in a way that defines-way a common error