Backpropagation
- Worked through a computational graph example (Sigmoid).
- Key Idea: To find the downstream gradient, multiply the upstream gradient and the local gradient.
Gates
Add Gate: Also known as a gradient distributor. The downstream gradients will equal the upstream gradient when there is an add gate.
Mul Gate: Also known as the swap multiplier. The downstream gradients will equal where x is the input.
Max Gate: Also known as the gradient router. The downstream gradients will equal the following, the greater input value’s gradient will equal the upstream gradient whereas the lower input value’s gradient will equal to zero.
Recap: Vector Derivatives
Scalar to Scalar
Regular Derivative
Vector to Scalar
The gradient:
Vector to Vector
The Jacobian:
i-th row, j-th col: