Introduction
We are introducing SAGE 2.5 Celer, a new series of hybrid models designed to bridge the gap between lightweight deployment and heavy reasoning tasks. Available in 3B, 8B, and 14B parameter sizes, these models feature a unique "Thinking" mode that allows them to allocate more compute time to complex queries before responding.
This release marks a significant step forward in making advanced tool-calling and mathematical reasoning accessible on consumer-grade hardware and edge devices.
Comparison: 14B Models
| Benchmarks | Qwen2.5 14B | SAGE 2.5 Celer (Std) | SAGE 2.5 Celer (Think) | Deepseek R1 14B |
|---|---|---|---|---|
| General | 77.87% | 86.67% | 88.27% | 81.00% |
| MMLU- | 67.13% | 70.91% | 76.47% | 69.20% |
| Math (GSM8K) | 94.31% | 94.31% | 95.68% | 93.33% |
| MATH | 79.20% | 73.49% | 87.37% | 89.78% |
| Multi-lingual | 62.29% | 72.50% | 73.43% | 63.86% |
Comparison: Small Models (3B)
| Statistic | Llama 3 3B | Qwen Small (3B) | SAGE 3B (Std) | SAGE 3B (Reason) |
|---|---|---|---|---|
| Non-Reasoning | 58.67% | 55.42% | 67.20% | 74.90% |
| MMLU | 33.76% | 31.40% | 40.10% | 50.85% |
| MMLU-Pro | 74.25% | 70.90% | 80.10% | 86.75% |
| Math | 45.84% | 40.25% | 47.25% | 55.80% |
Tool Calling Performance (BFCL)
| Category | SAGE 3B | SAGE 8B | Llama 3B |
|---|---|---|---|
| Simple | 94.5% | 96.8% | Not Supported |
| Parallel | 76.0% | 88.2% | Not Supported |
| Multiple | 92.0% | 95.0% | Not Supported |
Hybrid ModelsReasoningTool Calling2025
SAGEA AI Research

