Verified Deep Learning with Lean 4

3 MLP: Dense Layer

The workhorse: \(y = Wx + b\) plus ReLU and softmax cross-entropy loss.

Axiom 29 Dense Jacobian wrt input
#

\(\partial (Wx + b)_j / \partial x_i = W_{ij}\).

Axiom 30 Dense Jacobian wrt weight
#

\(\partial (Wx + b)_j / \partial W_{i'j'} = x_{i'} \delta _{jj'}\). Phase 7.

Axiom 31 ReLU Jacobian
#
Axiom 32 Softmax cross-entropy gradient
#

\(\partial L / \partial z = \mathrm{softmax}(z) - \mathrm{onehot}(y)\).

Theorem 33 Dense VJP
#
Theorem 34 Dense weight gradient is the outer product
#

\(dW = x \otimes dy\). Phase 7 promoted from vacuous rfl to theorem.

Theorem 35 Dense bias gradient is identity
#

\(db = dy\). Phase 7: derived, no new axiom.

Theorem 36 ReLU VJP
#