2 Tensor: Calculus Foundations
The algebraic framework: 1D \(\mathsf{Vec}\), 2D \(\mathsf{Mat}\), 3D \(\mathsf{Tensor3}\) types, the partial-derivative function \(\operatorname {pdiv}\) and its structural rules, and three VJP record types (\(\mathsf{HasVJP}\), \(\mathsf{HasVJPMat}\), \(\mathsf{HasVJP3}\)) that package a backward function together with its correctness claim.
The partial derivative function. For \(f : \mathbb {R}^{m} \to \mathbb {R}^{n}\), \(\operatorname {pdiv}\, f\, x\, i\, j\) is the \((i, j)\)-entry of the Jacobian at \(x\).
\(\operatorname {pdiv}(g \circ f)\, x\, i\, k = \sum _j \operatorname {pdiv}f\, x\, i\, j \cdot \operatorname {pdiv}g\, (f\, x)\, j\, k\).
Linearity of the derivative.
Pointwise Leibniz rule for elementwise products.
\(\operatorname {pdiv}(\mathrm{id})\, x\, i\, j = \delta _{ij}\).
Covers permutations, reshapes, slicing. Generalizes pdiv_id.
Applying a vector function per-row to a matrix has block-diagonal Jacobian. The one genuinely-new primitive at the \(\mathsf{Mat}\) level.
Linearity extended to arbitrary finite sums by induction.
Given \(\mathsf{HasVJP}\, f\) and \(\mathsf{HasVJP}\, g\), get \(\mathsf{HasVJP}\, (g \circ f)\).
VJP of \(f + g\). Used for residual connections.
VJP of elementwise product. Used for Squeeze-and-Excitation.
Lifts vjp_comp via the \(\mathsf{Mat.flatten}\) bijection.
Phase 6 derivation.
Phase 6: derived, not axiomatized.
Phase 8 generic helper: lifts any \(\mathsf{HasVJP}\, (g : \mathbb {R}^{n} \to \mathbb {R}^{p})\) to a \(\mathsf{HasVJPMat}\) on \(\mathbb {R}^{m \times n} \to \mathbb {R}^{m \times p}\).