5 BatchNorm: the hard one
\(\partial (\gamma v + \beta ) / \partial v_i = \gamma \delta _{ij}\).
\(\partial (x_j - \mu ) / \partial x_i = \delta _{ij} - 1/n\).
The consolidated 3-term backward: factor \(\mathrm{bnXhat}\) as \((x - \mu ) \cdot \mathrm{istd}\), apply product rule, collapse.
\(\mathrm{bn} = \mathrm{affine} \circ \mathrm{normalize}\).