Verified Deep Learning with Lean 4

9 LayerNorm and GELU

Axiom 55 GELU scalar function
#
Axiom 56 GELU scalar derivative
#
Axiom 57 GELU Jacobian
#

Diagonal activation Jacobian.

Theorem 58 LayerNorm VJP
#

LayerNorm = BatchNorm on a different axis; same primitive.

Theorem 59 GELU VJP
#