Module¶
In Sorix, the Module class is the fundamental building block for all neural network components. Whether you are building a simple activation function, a complex layer, or an entire deep neural network, you will almost always inherit from Module.
Its design is intentionally similar to PyTorch's nn.Module, making it intuitive for those coming from other frameworks while remaining simple enough to extend manually.
Key Features of Module¶
- Automatic Parameter Tracking: Any
Tensorattribute that hasrequires_grad=Trueis automatically collected by the.parameters()method. - Sub-module Registration: If you assign another
Moduleas an attribute of your class, Sorix will recursively find its parameters as well. - Device Management: The
.to(device)method moves all parameters and sub-modules to CPU or GPU (via CuPy). - Training/Evaluation Modes: The
.train()and.eval()methods toggle the behavior of layers likeDropoutandBatchNorm1dsurface-wide. - State Management:
.state_dict()and.load_state_dict()allow for easy serialization of your model's weights.
1. Creating a Custom Layer with Parameters¶
A "layer" in Sorix is just a Module that performs a specific operation. While we have built-in layers like Linear, you can easily create your own.
Let's implement a Parametric ReLU (PReLU), which is like a standard ReLU but with a learned slope $\alpha$ for negative values:
$$f(x) = \begin{cases} x & \text{if } x > 0 \\ \alpha x & \text{if } x \leq 0 \end{cases}$$
import numpy as np
from sorix import tensor
from sorix.nn import Module
class PReLU(Module):
def __init__(self, size=1, initial_alpha=0.25):
super().__init__()
# alpha is a learned parameter
self.alpha = tensor(np.full(size, initial_alpha), requires_grad=True)
def forward(self, x):
# x > 0 returns a boolean mask (converted to float in operation)
# We use Sorix operations to stay within the autograd graph
pos = (x > 0) * x
neg = (x <= 0) * (self.alpha * x)
return pos + neg
prelu = PReLU(size=1)
x = tensor([-2.0, 1.0, -0.5])
y = prelu(x)
print(f"Input: {x.numpy()}")
print(f"Output with initial alpha=0.25: {y.numpy()}")
Input: [-2. 1. -0.5] Output with initial alpha=0.25: [-0.5 1. -0.125]
Verifying Autograd in Custom Layers¶
To verify that our layer is indeed learning, we can perform a simple optimization step. If we want the output for negative numbers to be more positive, the optimizer should adjust alpha accordingly.
from sorix.optim import SGD
from sorix.nn import MSELoss
optimizer = SGD(prelu.parameters(), lr=0.1)
target = tensor([0.0, 1.0, 0.0]) # We want negative inputs to result in 0
criterion = MSELoss()
print(f"Alpha before update: {prelu.alpha.item():.4f}")
# One training step
y = prelu(x)
loss = criterion(y, target)
loss.backward()
optimizer.step()
optimizer.zero_grad()
print(f"Alpha after update: {prelu.alpha.item():.4f}")
print(f"New output: {prelu(x).numpy()}")
Alpha before update: 0.2500 Alpha after update: 0.1792 New output: [-0.35833333 1. -0.08958333]
2. Advanced Composition: Residual Blocks¶
Modern deep learning architectures (like ResNets) rely on Skip Connections. In Sorix, you can easily build complex re-usable blocks by nesting other modules.
from sorix.nn import Linear, ReLU, BatchNorm1d
class ResidualBlock(Module):
def __init__(self, dim):
super().__init__()
self.fc1 = Linear(dim, dim)
self.bn1 = BatchNorm1d(dim)
self.relu = ReLU()
self.fc2 = Linear(dim, dim)
self.bn2 = BatchNorm1d(dim)
def forward(self, x):
residual = x
out = self.fc1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.fc2(out)
out = self.bn2(out)
return self.relu(out + residual)
block = ResidualBlock(10)
print(f"Number of parameters in block: {len(block.parameters())}")
Number of parameters in block: 8
3. Training a Complete Architecture¶
Let's build a ResNet-style MLP and train it on a simple synthetic regression task to prove that the entire stack (nested modules, residual connections, custom layers, and optimizers) works in harmony.
class ResNetMLP(Module):
def __init__(self, input_dim, hidden_dim, output_dim, num_blocks=2):
super().__init__()
self.stem = Linear(input_dim, hidden_dim)
self.blocks = [ResidualBlock(hidden_dim) for _ in range(num_blocks)]
self.prelu = PReLU(size=hidden_dim)
self.head = Linear(hidden_dim, output_dim)
def forward(self, x):
x = self.stem(x)
for block in self.blocks:
x = block(x)
x = self.prelu(x)
return self.head(x)
model = ResNetMLP(input_dim=5, hidden_dim=16, output_dim=1)
optimizer = SGD(model.parameters(), lr=0.01)
criterion = MSELoss()
# Create synthetic data: y = sum(x)
X_train = tensor(np.random.randn(100, 5))
y_train = tensor(np.sum(X_train.numpy(), axis=1, keepdims=True))
print("Training ResNetMLP...")
for epoch in range(101):
model.train()
y_pred = model(X_train)
loss = criterion(y_pred, y_train)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if epoch % 20 == 0:
print(f"Epoch {epoch:3d} | Loss: {loss.item():.6f}")
Training ResNetMLP... Epoch 0 | Loss: 8.423694 Epoch 20 | Loss: 0.489973 Epoch 40 | Loss: 0.215984 Epoch 60 | Loss: 0.140743 Epoch 80 | Loss: 0.105683 Epoch 100 | Loss: 0.083125
4. Parameter and State Management¶
One of the most powerful features of Module is the .parameters() method. It automatically crawls the object's attributes (including lists, dictionaries, and sub-models) to find everything that needs to be optimized.
The state_dict() returns a dictionary mapping parameter names to their current values, perfect for saving weights.
from sorix import save, load
# Get state dict
sd = model.state_dict()
print("State dict keys sample:", list(sd.keys())[:5])
# Save and Load
save(sd, "resnet_model.sor")
loaded_weights = load("resnet_model.sor")
new_model = ResNetMLP(input_dim=5, hidden_dim=16, output_dim=1)
new_model.load_state_dict(loaded_weights)
print("\nWeights persistence verified!")
State dict keys sample: ['stem.W', 'stem.b', 'prelu.alpha', 'head.W', 'head.b'] Weights persistence verified!
5. Device and Mode Management¶
Since our model contains BatchNorm1d, switching between train() and eval() is mandatory for correct inference.
import sorix
# Switch to evaluation mode (essential for BatchNorm/Dropout)
model.eval()
print(f"In training mode? {model.training}")
# Move to GPU if available
if sorix.cuda.is_available():
model.to('cuda')
print("Entire model and its nested blocks moved to GPU memory.")
In training mode? False
✅ GPU basic operation passed ✅ GPU available: NVIDIA GeForce RTX 4070 Laptop GPU CUDA runtime version: 13000 CuPy version: 13.6.0 Entire model and its nested blocks moved to GPU memory.
Conclusion¶
By subclassing Module, you gain all the power of Sorix's ecosystem with minimal code. You can implement complex research architectures with skip connections and custom primitives, and Sorix will handle the gradients, optimization, and hardware acceleration for you.