But What is a Neural Network? Deep learning, chapter 1¶
Below are questions for the video But what is a Neural Network? | Deep learning, chapter 1. NN is short for “neural network”.
Please note that for some questions, the answers given here are not necessarily the only correct answers.
What problem is the NN discussed throughout the video trying to solve? Answer: Recognizing hand-written digits.
According to the video, what kind of NN is best for:
- image recognition? Answer: convolutional NN
- speech recognition? Answer: long short-term memory NN
What is the name of the kind of neural network discussed in the video? Answer: Multi-layer perception.
In simple mathematical terms, what does a neuron do? Answer: It holds a number. Later in the video, this is generalized to say that a neuron is a function.
What is the number in a neuron called? Answer: The neuron’s activation.
What are the range of possible activations for a neuron? Answer: 0 (least activated) to 1 (most activated)
In the NN example in the video:
What are the activations of the neurons in the first layer?
Answer: The activations of the first layer neurons are set to be the grayscale values of the image pixels.
How many neurons are in the last layer of the NN? Why that many? What do the activations in neurons of the last layer mean?
Answer: The last layer consists of 10 neurons, one for each of the possible digits 0 to 9. The activation of these last-layer neurons corresponds to what digit the network “thinks” the image contains, i.e. whatever last-layer neuron has the highest activation is what the NN thinks the digit is.
What are the layers between the input layer and last layer called? How many such layers are there? Why? How many neurons are in these layers? Why?
Answer: The layers between the input and last layers are called hidden layers, and in this NN there are 2 hidden layers with 16 neurons each. The choice of 2 and 16 is somewhat arbitrary, and different choices might work just as well (or worse, or better). People often experiment with different sizes and numbers of hidden layers.
Describe how neurons in one layer connect to other neurons. Answer: Each neuron in a layer is connected to every other neuron in just the next layer.
What are the possible values for weights on th edges between neurons? Answer: They are real numbers that could be positive, negative, or 0.
What is the sigmoid function? Answer: \(f(x) = \frac{1}{1 + e^{-x}}\)
Why is the sigmoid function function used at all? What is it’s purpose? Answer: The input to a neuron is the weighted sum of the activations of other neurons, and that weighted sum could be any real number. The sigmoid function is used to squish this number into the range -1 to 1, which makes it easier to work with.
What is the purpose of a bias in a neuron? Answer: It helps control when the neuron is activated, e.g. a high bias means the neuron needs a high input to be activated. The bias tells you how big the weighted sum must be before the neuron is active.
In a NN, what is learning? Answer: Learning in a NN is the process of setting the weights and biases to values that make the network work the way you want it to work (i.e. in this case to recognize hand-written digits)
Explain each part of the following formula from the video:
\[a^{(1)} = \sigma (Wa^{(0)} + b)\]Answer:
- \(a^{(0)}\) is a column vector that stores all the activations of the neurons in the input layer, and \(a^{(1)}\) is a column vector of all the activations in the second layer (i.e. the first hidden layer)
- \(W\) is a matrix of edge weights; the first row of \(W\) contains all the weights on the edges from \(a^{(0)}\) neurons to the first neuron of \(a^{(1)}\); the second row of \(W\) contains all the edge weights to the second neuron of \(a^{(1)}\); and so on.
- \(Wa^{(0)}\) calculates the weighted sums of inputs to all the neurons in layer \(a^{(1)}\)
- \(b\) is a column vector of the biases for all the neurons in layer \(a^{(1)}\)
- \(\sigma\) is the sigmoid function, and is use to squish the value of \(Wa^{(0)}+ b\) into the range -1 to 1; the intention is that \(\sigma\) is applied to each of the numbers in the vector that \(a^{(0)} + b\) evaluates to
What squishing function do modern NNs often use instead of the sigmoid function? Why?
Answer: ReLU, which is called the rectified linear unit, and is defined like this:
\[\textrm{ReLU}(a) = \text{max}(0,a)\]ReLU is faster to compute and easier to train than the sigmoid function.
Near the end of the video, it is stated that a NN can be thought of as a kind of function. If you treat the NN in this video as a function, what is it’s input and output?
Answer: The input is an image, and the output is a column vector with 10 elements that contains numbers that indicate what the NN thinks the image is. Usually the answer is taken to be the digit with the highest output value.
Practice Question¶
Consider a multi-layer NN, the same kind as in the video, designed to recognize both hand-written digits and alphabetic letters (lowercase a-z and uppercase A-Z). Each symbol is in a 200 x 200 pixel grayscale image. Suppose there are two hidden layers in the network, both with 25 neurons.
- How many neurons are in the:
- input layer? Answer: 200 * 200 = 40,000
- output layer? Answer: 10 + 26 + 26 = 62
- two hidden layers? Answer: 50
- entire network? Answer: 40000 + 62 + 50 = 40112
- How many edge weights are there from:
- The input layer to the first hidden layer? Answer: 40,000 * 25 = 1,000,000 (1 million)
- The first hidden layer to the second hidden layer? Answer: 25 * 25 = 625
- The second hidden layer to the output layer? Answer: 25 * 62 = 1550
- In total, how many edge weights does this neural network have? Answer: 1,000,000 + 625 + 1550 = 1,002,175
- How many biases does this neural network have? Answer: one for each neuron that is not in the input layer, so 62 + 50 = 113 biases; the input layer neurons have no biases because they are set to be the grayscale values of pixels