CS Grade 9-12 Answer Key

CS: Neural Networks: From Perceptron to Deep Learning

How artificial neurons learn patterns from data

Answer Key

Name:

Date:

Score: / 12

CS: Neural Networks: From Perceptron to Deep Learning

How artificial neurons learn patterns from data

CS - Grade 9-12

Instructions: Read each problem carefully. Show your reasoning, calculations, or explanations in the space provided.

1

A perceptron has two inputs, x1 = 1 and x2 = 0. The weights are w1 = 0.7 and w2 = -0.4, and the bias is b = -0.2. Compute the weighted sum z = w1x1 + w2x2 + b. If the perceptron outputs 1 when z is greater than or equal to 0 and outputs 0 otherwise, what is the output?

Multiply each input by its weight, add the bias, then apply the step rule.

The weighted sum is z = 0.7(1) + -0.4(0) + -0.2 = 0.5. Since 0.5 is greater than or equal to 0, the perceptron outputs 1.
2

Explain what a weight represents in a neural network. Use a simple example, such as predicting whether a student will pass a test based on study time and sleep.

A weight represents how strongly one input affects the neuron's output. For example, if study time has a large positive weight, then more study time strongly increases the predicted chance of passing. If sleep has a smaller positive weight, it still helps, but not as much.
3

A single neuron receives three inputs: x1 = 2, x2 = -1, and x3 = 3. The weights are w1 = 0.5, w2 = 1.0, and w3 = -0.25. The bias is b = 0.1. Calculate the neuron's pre-activation value z.

The pre-activation value is the weighted sum before an activation function is applied.

The pre-activation value is z = 0.5(2) + 1.0(-1) + -0.25(3) + 0.1 = 1 - 1 - 0.75 + 0.1 = -0.65.
4

The step activation function outputs only 0 or 1. The sigmoid activation function outputs values between 0 and 1. Explain why sigmoid can be useful for a model that predicts probabilities.

The sigmoid function is useful for predicting probabilities because its output is always between 0 and 1. A value such as 0.82 can be interpreted as an 82 percent estimated probability, while a step function only gives a hard yes or no answer.
5

A neural network has an input layer with 4 neurons, one hidden layer with 6 neurons, and an output layer with 2 neurons. How many layers contain trainable neurons, not counting the input layer as trainable computation?

Input neurons hold the features, while hidden and output neurons perform weighted computations.

There are 2 layers that contain trainable neurons: the hidden layer and the output layer. The input layer passes data into the network and is not usually counted as a trainable computation layer.
6

In your own words, explain the difference between a shallow neural network and a deep neural network.

A shallow neural network has only one or a small number of hidden layers. A deep neural network has many hidden layers, which allows it to learn more complex patterns by building up features step by step.
7

A model predicts 0.9 for an image whose correct label is 1. Another model predicts 0.2 for the same image. Which prediction has a smaller error if you measure error by the absolute difference between prediction and correct label? Show the calculation.

Absolute error measures how far the prediction is from the true value, ignoring direction.

The prediction 0.9 has a smaller error. Its absolute error is |1 - 0.9| = 0.1, while the prediction 0.2 has error |1 - 0.2| = 0.8.
8

Gradient descent is often described as walking downhill on a loss surface. Explain what the loss surface, the slope, and the learning rate represent in this analogy.

The loss surface represents how bad the model is for different weight values. The slope shows which direction changes the loss most quickly. The learning rate controls how big each update step is when the model changes its weights.
9

A neural network is training, but its loss decreases very slowly. One student suggests greatly increasing the learning rate. Explain one possible benefit and one possible risk of this change.

Think about the difference between small careful steps and giant steps when walking downhill.

A larger learning rate can help the model learn faster because it takes bigger update steps. The risk is that the steps may become too large, causing the model to jump over the best weights or make the loss unstable.
10

Match each task to a likely neural network output type: A. classifying an email as spam or not spam, B. predicting tomorrow's temperature, C. identifying which digit from 0 to 9 appears in an image. Explain your choices.

A should use a binary classification output because there are two choices, spam or not spam. B should use a regression output because temperature is a number on a continuous scale. C should use a multiclass classification output because the image belongs to one of ten digit classes.
11

Look at this simplified network: two input neurons feed into three hidden neurons, and the three hidden neurons feed into one output neuron. If every neuron in one layer connects to every neuron in the next layer, how many connections are there in total?

Count the connections between each pair of neighboring layers separately, then add them.

There are 9 total connections. The input-to-hidden connections are 2 x 3 = 6, and the hidden-to-output connections are 3 x 1 = 3, so 6 + 3 = 9.
12

A neural network correctly classifies almost all training images but performs poorly on new test images. What problem is this likely showing, and name two ways to reduce it.

A good model should perform well on both the data it studied and new data it has not seen before.

This is likely overfitting, which means the model learned the training data too specifically and did not generalize well. Two ways to reduce it are using more training data and applying regularization. Other helpful methods include simplifying the model, using dropout, or stopping training earlier.

LivePhysics™.com CS - Grade 9-12 - Answer Key