๐ Benchmarks: Sorix vs the Giants¶
This comprehensive benchmark compares Sorix against PyTorch and TensorFlow using the classic MNIST digit recognition task. The goal is to evaluate if a minimalist library can compete in performance while maintaining a significantly lower system footprint.
๐ฅ๏ธ Benchmark Environment¶
All tests were performed on a high-end workstation to measure peak performance and resource utilization:
- Hardware:
- CPU: Intelยฎ Coreโข i9 (32 Physical Cores available)
- RAM: 64 GB DDR5
- GPU: NVIDIAยฎ GeForce RTXโข 4070 Laptop (8 GB VRAM)
- Software State:
Python >=3.12,NumPy >=2.0,CuPy >=13.0,PyTorch >=2.0,TensorFlow >=2.15.
๐ Dataset & Task¶
- Source: MNIST - Digit Recognizer (Kaggle)
- Configuration:
- Training Set: 33,600 images (28x28 grayscale)
- Test Set: 8,400 images
- Model Architecture (MLP):
import sorix from sorix.nn import Module, Linear, BatchNorm1d, ReLU, Dropout class SorixModel(Module): def __init__(self): super().__init__() self.linear1 = Linear(784, 128, bias=False) self.bn1 = BatchNorm1d(128) self.linear2 = Linear(128, 64) self.linear3 = Linear(64, 10) self.relu = ReLU() self.dropout = Dropout(p=0.2) def forward(self, x): x = self.linear1(x); x = self.bn1(x); x = self.relu(x) x = self.linear2(x); x = self.relu(x) x = self.dropout(x); x = self.linear3(x) return x loss_fn = CrossEntropyLoss() optimizer = RMSprop(model.parameters(), lr=1e-3, alpha=0.99)
๐ Performance Results¶
The following table summarizes the training and inference times. For inference, we used a massive batch size of 4096 to leverage the 32 i9 cores.
1. Training & Inference Table¶
| Framework | Device | Train Batch | Train Time (5 Epochs) | Test Batch | Inference Time | Accuracy |
|---|---|---|---|---|---|---|
| Sorix | CPU | 128 | 7.36s | 4096 | 0.032s | 0.972 |
| PyTorch | CPU | 128 | 9.21s | 4096 | 0.048s | 0.974 |
| TensorFlow | CPU | 128 | 17.14s | 4096 | 0.186s | 0.968 |
| Sorix | GPU | 128 | 6.21s | 1024 | 0.015s | 0.976 |
| PyTorch | GPU | 128 | 4.04s | 1024 | 0.025s | 0.976 |
| TensorFlow | GPU | 128 | 8.98s | 1024 | 1.703s | 0.975 |
Important
Sorix is faster than PyTorch and TensorFlow in CPU training and maintains a significant lead in inference speed across all devices, all while being a fraction of their size.
2. Exported Model Size (Weights only)¶
Comparing the size of serialized model files containing only weights and architecture metadata.
| Framework | File Size (KB) |
|---|---|
| Sorix | ~429 KB |
| PyTorch | ~432 KB |
| TensorFlow | ~890 KB |
๐พ Framework Footprint (Isolated Venvs)¶
To measure the true "weight" of each framework, we created independent virtual environments and installed the specific CPU/GPU versions of each library.
| Library | Version | Isolated Venv Size |
|---|---|---|
| Sorix | CPU Core | 54.00 MB |
| Sorix | GPU Support | 238.58 MB |
| PyTorch | CPU Core | 702.51 MB |
| PyTorch | GPU Support | 6,840.16 MB |
| TensorFlow | CPU Core | 1,406.05 MB |
| TensorFlow | GPU Support | 1,978.60 MB |
Tip
Sorix is ~13x smaller than PyTorch and ~26x smaller than TensorFlow in its CPU version. For GPU deployment, Sorix is ~28x smaller than PyTorch, essentially because Sorix uses the system's CUDA/cuDNN instead of packaging its own multi-gigabyte binaries.
๐ Reproduce the Results¶
Detailed logs, interactive charts, and the full step-by-step implementation are available in the benchmark notebook:
๐ MNIST Comparison Notebook (examples/benchmarks/mnist_comparison.ipynb)