Introduction to Linear Algebra & Multivariate Calculus

1.1 Why Multivariate Calculus in Machine Learning?

Multivariate calculus is fundamental to machine learning because:

Most ML models have multiple parameters (high-dimensional spaces)
Optimization requires understanding how functions change in multiple directions
Neural networks rely heavily on gradient-based learning
Concepts like gradients, Jacobians, and Hessians appear everywhere in ML

1.2 Scalars, Vectors, Matrices, and Tensors

Scalar: Single number (0-dimensional)
Vector: 1D array of numbers (magnitude and direction)
Matrix: 2D array of numbers (linear transformations)
Tensor: Generalized n-dimensional array

Interactive Example: Vector Visualization

Vector 1

X component Y component

Vector 2

X component Y component

View Code

import numpy as np
import matplotlib.pyplot as plt

# Define vectors
v1 = np.array([2, 3])
v2 = np.array([-1, 2])

# Plot
plt.figure(figsize=(6,6))
plt.quiver(0, 0, v1[0], v1[1], angles='xy', scale_units='xy', scale=1, color='r', label='Vector v1')
plt.quiver(0, 0, v2[0], v2[1], angles='xy', scale_units='xy', scale=1, color='b', label='Vector v2')
plt.xlim(-3, 3)
plt.ylim(-3, 3)
plt.grid()
plt.legend()
plt.title("Vector Visualization")
plt.show()
                        

Matrix Operations in Linear Algebra

Matrix Multiplication

For matrices A (m×n) and B (n×p), their product C = AB is an m×p matrix where:

$$ C_{ij} = \sum_{k=1}^{n} A_{ik} B_{kj} $$

The number of columns in A must equal the number of rows in B.

Interactive Matrix Multiplication

Rows in Matrix A

Columns in Matrix A (and Rows in B)

Columns in Matrix B

Enter Matrix A elements:

Enter Matrix B elements:

View Code

import numpy as np

# Define matrices
A = np.array([[1, 2], [3, 4]])  # 2x2 matrix
B = np.array([[5, 6], [7, 8]])  # 2x2 matrix

# Matrix multiplication
C = np.matmul(A, B)  # or A @ B

print("Result of A × B:")
print(C)

# Calculate determinant
det = np.linalg.det(A)
print(f"Determinant: {det}")
                        

Other Matrix Operations

Transpose: Flip rows and columns
Determinant: Scalar value for square matrices
Inverse: Matrix that when multiplied gives identity matrix

Partial Derivatives in Multivariate Calculus

The partial derivative of a function $ f(x_1, x_2, \dots, x_n) $ with respect to $ x_i $ is:

$$ \frac{\partial f}{\partial x_i} = \lim_{h \to 0} \frac{f(x_1, \dots, x_i + h, \dots, x_n) - f(x_1, \dots, x_i, \dots, x_n)}{h} $$

It measures how the function changes as we vary one variable while holding others constant.

Interactive Partial Derivative Calculator

Select a function to analyze:

x value

y value

View Code

import sympy as sp

# Define symbols
x, y = sp.symbols('x y')

# Define function
f = x**2 + y**2  # Example function

# Calculate partial derivatives
df_dx = sp.diff(f, x)
df_dy = sp.diff(f, y)

# Evaluate at a point
point = {x: 1.0, y: 2.0}
df_dx_val = df_dx.subs(point).evalf()
df_dy_val = df_dy.subs(point).evalf()

print(f"Partial derivative wrt x: {df_dx}")
print(f"Partial derivative wrt y: {df_dy}")
print(f"At point (1,2): df/dx = {df_dx_val}, df/dy = {df_dy_val}")
                        

The Gradient Vector

Definition of Gradient

The gradient of a scalar-valued function $ f(x_1, x_2, \dots, x_n) $ is a vector containing all its partial derivatives:

$$ \nabla f = \left( \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \dots, \frac{\partial f}{\partial x_n} \right) $$

The gradient points in the direction of steepest ascent of the function.

Interactive Gradient Calculator

Select a function to analyze:

x value

y value

View Code

import sympy as sp

# Define symbols
x, y = sp.symbols('x y')

# Define function
f = x**2 + y**2  # Example function

# Calculate gradient
gradient = sp.Matrix([sp.diff(f, x), sp.diff(f, y)])

# Evaluate at a point
point = {x: 0.5, y: 0.5}
gradient_val = gradient.subs(point)

print(f"Gradient: {gradient}")
print(f"At point (0.5,0.5): {gradient_val}")
                        

The Jacobian Matrix

Functions from ℝⁿ to ℝᵐ

For a vector-valued function 𝐟: ℝⁿ → ℝᵐ with m component functions:

$$ \mathbf{f}(\mathbf{x}) = \begin{bmatrix} f_1(x_1, \dots, x_n) \\ \vdots \\ f_m(x_1, \dots, x_n) \end{bmatrix} $$

The Jacobian matrix J is an m×n matrix of all first-order partial derivatives:

$$ J = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_n} \end{bmatrix} $$

Interactive Jacobian Calculator

Select a vector-valued function to analyze:

x value

y value

View Code

import sympy as sp

# Define symbols
x, y = sp.symbols('x y')

# Define vector-valued function
f1 = x**2 + y**2
f2 = x**2 - y**2
F = sp.Matrix([f1, f2])

# Calculate Jacobian
J = F.jacobian([x, y])

# Evaluate at a point
point = {x: 1.0, y: 1.0}
J_val = J.subs(point)

print(f"Jacobian matrix: {J}")
print(f"At point (1,1): {J_val}")
                        

The Hessian Matrix

Second-Order Derivatives

The Hessian matrix of a function $ f(x_1, x_2, \dots, x_n) $ is a square matrix of second-order partial derivatives:

$$ H(f) = \begin{bmatrix} \frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_n} \\ \frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2} & \cdots & \frac{\partial^2 f}{\partial x_2 \partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n \partial x_1} & \frac{\partial^2 f}{\partial x_n \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_n^2} \end{bmatrix} $$

The Hessian provides information about the local curvature of the function.

Interactive Hessian Calculator

Select a function to analyze:

x value

y value

View Code

import sympy as sp

# Define symbols
x, y = sp.symbols('x y')

# Define function
f = x**3 + y**3 - 3*x*y  # Example function

# Calculate Hessian
hessian = sp.hessian(f, [x, y])

# Evaluate at a point
point = {x: 1.0, y: 1.0}
hessian_val = hessian.subs(point)

print(f"Hessian matrix: {hessian}")
print(f"At point (1,1): {hessian_val}")

# Compute eigenvalues
eigenvalues = hessian_val.eigenvals()
print("Eigenvalues:", eigenvalues)
                        

Optimization in Multivariate Settings

Gradient Descent

Gradient descent is an iterative optimization algorithm for finding local minima:

Start at initial point $ \mathbf{x}_0 $
Update rule: $ \mathbf{x}_{k+1} = \mathbf{x}_k - \alpha \nabla f(\mathbf{x}_k) $
Repeat until convergence

where $ \alpha $ is the learning rate.

Interactive Gradient Descent Demonstration

Select a function to optimize:

Starting x

Starting y

Learning rate (α)

Maximum iterations 50

View Code

import numpy as np
import sympy as sp

# Define function
x, y = sp.symbols('x y')
f = x**2 + y**2  # Example function

# Calculate gradient
df_dx = sp.diff(f, x)
df_dy = sp.diff(f, y)

# Convert to numerical functions
f_numeric = sp.lambdify((x, y), f, 'numpy')
df_dx_numeric = sp.lambdify((x, y), df_dx, 'numpy')
df_dy_numeric = sp.lambdify((x, y), df_dy, 'numpy')

# Gradient descent parameters
learning_rate = 0.1
max_iter = 50
current_x, current_y = 2.0, 2.0  # Initial point

# Store path
path = [(current_x, current_y, f_numeric(current_x, current_y))]

# Run gradient descent
for i in range(max_iter):
    # Compute gradient
    grad_x = df_dx_numeric(current_x, current_y)
    grad_y = df_dy_numeric(current_x, current_y)
    
    # Update parameters
    current_x -= learning_rate * grad_x
    current_y -= learning_rate * grad_y
    
    # Store current position
    path.append((current_x, current_y, f_numeric(current_x, current_y)))

print(f"Final position: ({current_x:.4f}, {current_y:.4f})")
print(f"Final function value: {f_numeric(current_x, current_y):.4f}")
                        

Applications in Machine Learning

Linear Regression: Gradient-Based Cost Minimization

For linear regression with hypothesis $ h_θ(x) = θ^T x $, the cost function is:

$$ J(θ) = \frac{1}{2m} \sum_{i=1}^m (h_θ(x^{(i)}) - y^{(i)})^2 $$

The gradient of the cost function with respect to θ is:

$$ \nabla_θ J(θ) = \frac{1}{m} X^T (Xθ - y) $$

Interactive Linear Regression Demo

Number of iterations 100 Learning rate 0.1

View Code

import numpy as np

# Generate synthetic data
np.random.seed(42)
m = 100  # Number of samples
X = 2 * np.random.rand(m, 1)
y = 2 + 3 * X + np.random.randn(m, 1)  # True parameters: θ0=2, θ1=3

# Add bias term
X_b = np.c_[np.ones((m, 1)), X]

# Cost function
def cost_function(theta, X, y):
    m = len(y)
    predictions = X.dot(theta)
    return (1/(2*m)) * np.sum((predictions - y)**2)

# Gradient of cost function
def gradient(theta, X, y):
    m = len(y)
    return (1/m) * X.T.dot(X.dot(theta) - y)

# Gradient descent
def gradient_descent(X, y, theta, learning_rate, iterations):
    cost_history = []
    for i in range(iterations):
        theta -= learning_rate * gradient(theta, X, y)
        cost_history.append(cost_function(theta, X, y))
    return theta, cost_history

# Run gradient descent
theta_initial = np.random.randn(2, 1)
theta_final, cost_history = gradient_descent(
    X_b, y, theta_initial, learning_rate=0.1, iterations=100
)

print(f"Final parameters: θ0={theta_final[0][0]:.4f}, θ1={theta_final[1][0]:.4f}")
                        

Multivariate Calculus Tutorial - Navigation

Introduction to Linear Algebra & Multivariate Calculus

1.1 Why Multivariate Calculus in Machine Learning?

1.2 Scalars, Vectors, Matrices, and Tensors

Interactive Example: Vector Visualization

Vector 1

Vector 2

Matrix Operations in Linear Algebra

Matrix Multiplication

Interactive Matrix Multiplication

Enter Matrix A elements:

Enter Matrix B elements:

Calculation steps:

Determinant of A:

Other Matrix Operations

Partial Derivatives in Multivariate Calculus

Interactive Partial Derivative Calculator

The Gradient Vector

Definition of Gradient

Interactive Gradient Calculator

The Jacobian Matrix

Functions from ℝⁿ to ℝᵐ

Interactive Jacobian Calculator

The Hessian Matrix

Second-Order Derivatives

Interactive Hessian Calculator

Optimization in Multivariate Settings

Gradient Descent

Interactive Gradient Descent Demonstration

Applications in Machine Learning

Linear Regression: Gradient-Based Cost Minimization

Interactive Linear Regression Demo