3 MLP: Dense Layer
The workhorse: \(y = Wx + b\) plus ReLU and softmax cross-entropy loss.
\(\partial (Wx + b)_j / \partial x_i = W_{ij}\).
\(\partial (Wx + b)_j / \partial W_{i'j'} = x_{i'} \delta _{jj'}\). Phase 7.
\(\partial L / \partial z = \mathrm{softmax}(z) - \mathrm{onehot}(y)\).
\(dW = x \otimes dy\). Phase 7 promoted from vacuous rfl to theorem.
\(db = dy\). Phase 7: derived, no new axiom.