Verified Deep Learning with Lean 4

5 BatchNorm: the hard one

Axiom 43 BN affine step Jacobian
#

\(\partial (\gamma v + \beta ) / \partial v_i = \gamma \delta _{ij}\).

Axiom 44 BN centering Jacobian
#

\(\partial (x_j - \mu ) / \partial x_i = \delta _{ij} - 1/n\).

Axiom 45 BN inverse-stddev broadcast Jacobian
#
Theorem 46 BN normalize 3-term VJP
#

The consolidated 3-term backward: factor \(\mathrm{bnXhat}\) as \((x - \mu ) \cdot \mathrm{istd}\), apply product rule, collapse.

Theorem 47 BN affine VJP
#
Theorem 48 Full BN VJP
#

\(\mathrm{bn} = \mathrm{affine} \circ \mathrm{normalize}\).