Backpropagation

  • Worked through a computational graph example (Sigmoid).
  • Key Idea: To find the downstream gradient, multiply the upstream gradient and the local gradient.

Gates

Add Gate: Also known as a gradient distributor. The downstream gradients will equal the upstream gradient when there is an add gate.

Mul Gate: Also known as the swap multiplier. The downstream gradients will equal where x is the input.

Max Gate: Also known as the gradient router. The downstream gradients will equal the following, the greater input value’s gradient will equal the upstream gradient whereas the lower input value’s gradient will equal to zero.

Recap: Vector Derivatives

Scalar to Scalar

Regular Derivative

Vector to Scalar

The gradient:

Vector to Vector

The Jacobian:

i-th row, j-th col: