MSELoss¶

The Mean Squared Error (MSE) loss measures the average of the squares of the errors. It is the most common loss function for Regression tasks.

$$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$

Where:

$n$ is the batch size.
$y_i$ is the target value.
$\hat{y}_i$ is the predicted value.

In [1]:

Copied!

# Uncomment the next line and run this cell to install sorix
#!pip install 'sorix @ git+https://github.com/Mitchell-Mirano/sorix.git@main'
# Uncomment the next line and run this cell to install sorix
#!pip install 'sorix @ git+https://github.com/Mitchell-Mirano/sorix.git@main'

In [2]:

Copied!





import numpy as np
from sorix import tensor
from sorix.nn import MSELoss

# Create data
y_pred = tensor([2.5, 0.0, 2.1], requires_grad=True)
y_true = tensor([3.0, 0.0, 2.0])

criterion = MSELoss()
loss = criterion(y_pred, y_true)

print(f"Predictions: {y_pred.numpy()}")
print(f"Targets:     {y_true.numpy()}")
print(f"MSE Loss:    {loss.item():.4f}")
import numpy as np
from sorix import tensor
from sorix.nn import MSELoss

# Create data
y_pred = tensor([2.5, 0.0, 2.1], requires_grad=True)
y_true = tensor([3.0, 0.0, 2.0])

criterion = MSELoss()
loss = criterion(y_pred, y_true)

print(f"Predictions: {y_pred.numpy()}")
print(f"Targets:     {y_true.numpy()}")
print(f"MSE Loss:    {loss.item():.4f}")

Predictions: [2.5 0.  2.1]
Targets:     [3. 0. 2.]
MSE Loss:    0.0867

Verification with Autograd¶

MSELoss in Sorix is fully differentiable. If we compute the backward pass, we can see the gradients w.r.t the predictions.

In [3]:

Copied!





loss.backward()
print(f"Gradients w.r.t y_pred: {y_pred.grad}")

# Manual verification: d/dy_pred (1/n * (y_pred - y_true)^2) = 2/n * (y_pred - y_true)
n = y_pred.data.size
manual_grad = 2/n * (y_pred.data - y_true.data)
print(f"Manual Gradients:     {manual_grad}")
loss.backward()
print(f"Gradients w.r.t y_pred: {y_pred.grad}")

# Manual verification: d/dy_pred (1/n * (y_pred - y_true)^2) = 2/n * (y_pred - y_true)
n = y_pred.data.size
manual_grad = 2/n * (y_pred.data - y_true.data)
print(f"Manual Gradients:     {manual_grad}")

Gradients w.r.t y_pred: tensor([-0.33333334,  0.        ,  0.06666661], dtype=sorix.float64)
Manual Gradients:     [-0.33333334  0.          0.0666666 ]

Training Example¶

Let's see how MSELoss guides a single value to match a target.

In [4]:

Copied!





from sorix.optim import SGD

weight = tensor([10.0], requires_grad=True)
target = tensor([42.0])
optimizer = SGD([weight], lr=0.1)

print(f"Initial weight: {weight.item():.2f}")

for i in range(21):
    loss = criterion(weight, target)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    if i % 5 == 0:
        print(f"Step {i:2d} | Weight: {weight.item():.4f} | Loss: {loss.item():.4f}")

print(f"Final weight: {weight.item():.2f}")
from sorix.optim import SGD

weight = tensor([10.0], requires_grad=True)
target = tensor([42.0])
optimizer = SGD([weight], lr=0.1)

print(f"Initial weight: {weight.item():.2f}")

for i in range(21):
    loss = criterion(weight, target)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    if i % 5 == 0:
        print(f"Step {i:2d} | Weight: {weight.item():.4f} | Loss: {loss.item():.4f}")

print(f"Final weight: {weight.item():.2f}")

Initial weight: 10.00
Step  0 | Weight: 16.4000 | Loss: 1024.0000
Step  5 | Weight: 33.6114 | Loss: 109.9512
Step 10 | Weight: 39.2512 | Loss: 11.8059
Step 15 | Weight: 41.0993 | Loss: 1.2676
Step 20 | Weight: 41.7049 | Loss: 0.1361
Final weight: 41.70