Module¶

In Sorix, the Module class is the fundamental building block for all neural network components. Whether you are building a simple activation function, a complex layer, or an entire deep neural network, you will almost always inherit from Module.

Its design is intentionally similar to PyTorch's nn.Module, making it intuitive for those coming from other frameworks while remaining simple enough to extend manually.

Key Features of `Module`¶

Automatic Parameter Tracking: Any Tensor attribute that has requires_grad=True is automatically collected by the .parameters() method.
Sub-module Registration: If you assign another Module as an attribute of your class, Sorix will recursively find its parameters as well.
Device Management: The .to(device) method moves all parameters and sub-modules to CPU or GPU (via CuPy).
Training/Evaluation Modes: The .train() and .eval() methods toggle the behavior of layers like Dropout and BatchNorm1d surface-wide.
State Management: .state_dict() and .load_state_dict() allow for easy serialization of your model's weights.

1. Creating a Custom Layer with Parameters¶

A "layer" in Sorix is just a Module that performs a specific operation. While we have built-in layers like Linear, you can easily create your own.

Let's implement a Parametric ReLU (PReLU), which is like a standard ReLU but with a learned slope $\alpha$ for negative values:

$$f(x) = \begin{cases} x & \text{if } x > 0 \\ \alpha x & \text{if } x \leq 0 \end{cases}$$

In [1]:

Copied!





import numpy as np
from sorix import tensor
from sorix.nn import Module

class PReLU(Module):
    def __init__(self, size=1, initial_alpha=0.25):
        super().__init__()
        # alpha is a learned parameter
        self.alpha = tensor(np.full(size, initial_alpha), requires_grad=True)
        
    def forward(self, x):
        # x > 0 returns a boolean mask (converted to float in operation)
        # We use Sorix operations to stay within the autograd graph
        pos = (x > 0) * x
        neg = (x <= 0) * (self.alpha * x)
        return pos + neg

prelu = PReLU(size=1)
x = tensor([-2.0, 1.0, -0.5])
y = prelu(x)

print(f"Input: {x.numpy()}")
print(f"Output with initial alpha=0.25: {y.numpy()}")
import numpy as np
from sorix import tensor
from sorix.nn import Module

class PReLU(Module):
    def __init__(self, size=1, initial_alpha=0.25):
        super().__init__()
        # alpha is a learned parameter
        self.alpha = tensor(np.full(size, initial_alpha), requires_grad=True)
        
    def forward(self, x):
        # x > 0 returns a boolean mask (converted to float in operation)
        # We use Sorix operations to stay within the autograd graph
        pos = (x > 0) * x
        neg = (x <= 0) * (self.alpha * x)
        return pos + neg

prelu = PReLU(size=1)
x = tensor([-2.0, 1.0, -0.5])
y = prelu(x)

print(f"Input: {x.numpy()}")
print(f"Output with initial alpha=0.25: {y.numpy()}")

Input: [-2.   1.  -0.5]
Output with initial alpha=0.25: [-0.5    1.    -0.125]

Verifying Autograd in Custom Layers¶

To verify that our layer is indeed learning, we can perform a simple optimization step. If we want the output for negative numbers to be more positive, the optimizer should adjust alpha accordingly.

In [2]:

Copied!





from sorix.optim import SGD
from sorix.nn import MSELoss

optimizer = SGD(prelu.parameters(), lr=0.1)
target = tensor([0.0, 1.0, 0.0]) # We want negative inputs to result in 0
criterion = MSELoss()

print(f"Alpha before update: {prelu.alpha.item():.4f}")

# One training step
y = prelu(x)
loss = criterion(y, target)
loss.backward()
optimizer.step()
optimizer.zero_grad()

print(f"Alpha after update: {prelu.alpha.item():.4f}")
print(f"New output: {prelu(x).numpy()}")
from sorix.optim import SGD
from sorix.nn import MSELoss

optimizer = SGD(prelu.parameters(), lr=0.1)
target = tensor([0.0, 1.0, 0.0]) # We want negative inputs to result in 0
criterion = MSELoss()

print(f"Alpha before update: {prelu.alpha.item():.4f}")

# One training step
y = prelu(x)
loss = criterion(y, target)
loss.backward()
optimizer.step()
optimizer.zero_grad()

print(f"Alpha after update: {prelu.alpha.item():.4f}")
print(f"New output: {prelu(x).numpy()}")

Alpha before update: 0.2500
Alpha after update: 0.1792
New output: [-0.35833333  1.         -0.08958333]

2. Advanced Composition: Residual Blocks¶

Modern deep learning architectures (like ResNets) rely on Skip Connections. In Sorix, you can easily build complex re-usable blocks by nesting other modules.

In [3]:

Copied!





from sorix.nn import Linear, ReLU, BatchNorm1d

class ResidualBlock(Module):
    def __init__(self, dim):
        super().__init__()
        self.fc1 = Linear(dim, dim)
        self.bn1 = BatchNorm1d(dim)
        self.relu = ReLU()
        self.fc2 = Linear(dim, dim)
        self.bn2 = BatchNorm1d(dim)
        
    def forward(self, x):
        residual = x
        
        out = self.fc1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.fc2(out)
        out = self.bn2(out)
        
        return self.relu(out + residual)

block = ResidualBlock(10)
print(f"Number of parameters in block: {len(block.parameters())}")
from sorix.nn import Linear, ReLU, BatchNorm1d

class ResidualBlock(Module):
    def __init__(self, dim):
        super().__init__()
        self.fc1 = Linear(dim, dim)
        self.bn1 = BatchNorm1d(dim)
        self.relu = ReLU()
        self.fc2 = Linear(dim, dim)
        self.bn2 = BatchNorm1d(dim)
        
    def forward(self, x):
        residual = x
        
        out = self.fc1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.fc2(out)
        out = self.bn2(out)
        
        return self.relu(out + residual)

block = ResidualBlock(10)
print(f"Number of parameters in block: {len(block.parameters())}")

Number of parameters in block: 8

3. Training a Complete Architecture¶

Let's build a ResNet-style MLP and train it on a simple synthetic regression task to prove that the entire stack (nested modules, residual connections, custom layers, and optimizers) works in harmony.

In [4]:

Copied!





class ResNetMLP(Module):
    def __init__(self, input_dim, hidden_dim, output_dim, num_blocks=2):
        super().__init__()
        self.stem = Linear(input_dim, hidden_dim)
        self.blocks = [ResidualBlock(hidden_dim) for _ in range(num_blocks)]
        self.prelu = PReLU(size=hidden_dim)
        self.head = Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        x = self.stem(x)
        for block in self.blocks:
            x = block(x)
        x = self.prelu(x)
        return self.head(x)

model = ResNetMLP(input_dim=5, hidden_dim=16, output_dim=1)
optimizer = SGD(model.parameters(), lr=0.01)
criterion = MSELoss()

# Create synthetic data: y = sum(x) 
X_train = tensor(np.random.randn(100, 5))
y_train = tensor(np.sum(X_train.numpy(), axis=1, keepdims=True))

print("Training ResNetMLP...")
for epoch in range(101):
    model.train()
    y_pred = model(X_train)
    loss = criterion(y_pred, y_train)
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if epoch % 20 == 0:
        print(f"Epoch {epoch:3d} | Loss: {loss.item():.6f}")
class ResNetMLP(Module):
    def __init__(self, input_dim, hidden_dim, output_dim, num_blocks=2):
        super().__init__()
        self.stem = Linear(input_dim, hidden_dim)
        self.blocks = [ResidualBlock(hidden_dim) for _ in range(num_blocks)]
        self.prelu = PReLU(size=hidden_dim)
        self.head = Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        x = self.stem(x)
        for block in self.blocks:
            x = block(x)
        x = self.prelu(x)
        return self.head(x)

model = ResNetMLP(input_dim=5, hidden_dim=16, output_dim=1)
optimizer = SGD(model.parameters(), lr=0.01)
criterion = MSELoss()

# Create synthetic data: y = sum(x) 
X_train = tensor(np.random.randn(100, 5))
y_train = tensor(np.sum(X_train.numpy(), axis=1, keepdims=True))

print("Training ResNetMLP...")
for epoch in range(101):
    model.train()
    y_pred = model(X_train)
    loss = criterion(y_pred, y_train)
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if epoch % 20 == 0:
        print(f"Epoch {epoch:3d} | Loss: {loss.item():.6f}")

Training ResNetMLP...
Epoch   0 | Loss: 8.423694
Epoch  20 | Loss: 0.489973
Epoch  40 | Loss: 0.215984
Epoch  60 | Loss: 0.140743
Epoch  80 | Loss: 0.105683
Epoch 100 | Loss: 0.083125

4. Parameter and State Management¶

One of the most powerful features of Module is the .parameters() method. It automatically crawls the object's attributes (including lists, dictionaries, and sub-models) to find everything that needs to be optimized.

The state_dict() returns a dictionary mapping parameter names to their current values, perfect for saving weights.

In [5]:

Copied!





from sorix import save, load

# Get state dict
sd = model.state_dict()
print("State dict keys sample:", list(sd.keys())[:5])

# Save and Load
save(sd, "resnet_model.sor")
loaded_weights = load("resnet_model.sor")

new_model = ResNetMLP(input_dim=5, hidden_dim=16, output_dim=1)
new_model.load_state_dict(loaded_weights)

print("\nWeights persistence verified!")
from sorix import save, load

# Get state dict
sd = model.state_dict()
print("State dict keys sample:", list(sd.keys())[:5])

# Save and Load
save(sd, "resnet_model.sor")
loaded_weights = load("resnet_model.sor")

new_model = ResNetMLP(input_dim=5, hidden_dim=16, output_dim=1)
new_model.load_state_dict(loaded_weights)

print("\nWeights persistence verified!")

State dict keys sample: ['stem.W', 'stem.b', 'prelu.alpha', 'head.W', 'head.b']

Weights persistence verified!

5. Device and Mode Management¶

Since our model contains BatchNorm1d, switching between train() and eval() is mandatory for correct inference.

In [6]:

Copied!





import sorix

# Switch to evaluation mode (essential for BatchNorm/Dropout)
model.eval()
print(f"In training mode? {model.training}")

# Move to GPU if available
if sorix.cuda.is_available():
    model.to('cuda')
    print("Entire model and its nested blocks moved to GPU memory.")
import sorix

# Switch to evaluation mode (essential for BatchNorm/Dropout)
model.eval()
print(f"In training mode? {model.training}")

# Move to GPU if available
if sorix.cuda.is_available():
    model.to('cuda')
    print("Entire model and its nested blocks moved to GPU memory.")

In training mode? False

✅ GPU basic operation passed
✅ GPU available: NVIDIA GeForce RTX 4070 Laptop GPU
CUDA runtime version: 13000
CuPy version: 13.6.0
Entire model and its nested blocks moved to GPU memory.

Conclusion¶

By subclassing Module, you gain all the power of Sorix's ecosystem with minimal code. You can implement complex research architectures with skip connections and custom primitives, and Sorix will handle the gradients, optimization, and hardware acceleration for you.