SAGEA / Research First AI Company Building Frontier AI, Agents, Assistants & Services

1. Abstract

SAGE 2.4 Actus-bio is a 32-billion-parameter medical reasoning model developed by SAGEA, derived from the pretrained weights of SAGE 2.4 Actus through continued domain-specific fine-tuning. The principal architectural departure from the base model is the integration of a parallel Tree of Thoughts (ToT) branch evaluator within the Meta-Cognitive Head (MCH), enabling the model to maintain and score K simultaneous reasoning trajectories before committing to an output.

The model was trained on a curated suite of four medical datasets spanning clinical QA reasoning traces, broad biomedical corpora, structured clinical notes, and multilingual medical text with South Asian language coverage. This reflects SAGEA’s primary deployment context in Nepal and the wider South Asian region. Evaluations demonstrate it achieves competitive performance against open-weight models of significantly larger scale (70B parameters), particularly on open-ended literature reasoning tasks.

2. Architecture & Tree of Thoughts (ToT)

The principal architectural departure from the base 2.4 Actus model is the replacement of the discrete fast/slow gating mechanism with a parallel Tree of Thoughts (ToT) branch evaluator. Unlike models that sequentially refine a single hypothesis, SAGE 2.4 Actus-bio is structurally designed for differential diagnosis. Given a clinical query, the model generates K=3 independent chain-of-thought trajectories, each exploring a distinct clinical hypothesis.

Every trajectory is scored by aggregating token-level confidence signals from the Inverse Reasoning (IR) pipeline. The model computes a per-trajectory confidence score by evaluating the token-level confidence signals across the trajectory, and selects the reasoning path with the highest aggregate validity score.

Design Rationale: This architectural choice is motivated by the structure of clinical reasoning itself. Differential diagnosis is fundamentally a branching problem. A clinician encountering a patient with chest pain does not sequentially refine a single hypothesis but simultaneously holds acute myocardial infarction, aortic dissection, and pulmonary embolism as live candidates, weighing evidence for and against each.

3. Training Data & Methodology

SAGE 2.4 Actus-bio was trained via continued fine-tuning of the 2.4 Actus pretrained checkpoint using SAGEA’s IR pipeline. The model was fine-tuned on approximately 915,000 curated examples from a suite of multiple curated datasets.

4. Benchmark Results

We conducted zero-shot evaluations against state-of-the-art open-weight models, such as 70B parameter iterations of Meditron and OpenBioLLM. SAGE 2.4 Actus-bio (32B) demonstrated state-of-the-art capability in open-ended reasoning tasks.

Benchmark	Actus-bio (32B)	Meditron-70B	OpenBioLLM-70B
PubMedQA	81.2 ± 0.5	76.8 ± 0.7	74.5 ± 0.8
MedMCQA	75.8 ± 0.8	72.4 ± 0.6	76.2 ± 0.5
BioASQ	77.3 ± 0.7	73.1 ± 0.6	75.8 ± 0.5
MedBench	72.6 ± 0.9	68.4 ± 0.7	71.1 ± 0.6
MedQA (USMLE)	79.4 ± 0.6	81.6 ± 0.4	84.1 ± 0.3

The ToT architecture enables Actus-bio (32B) to match or exceed 70B competitors on open-ended diagnostic hypotheses benchmarks (PubMedQA, BioASQ, MedBench). However, it trails on single-answer rapid-recall benchmarks (USMLE, MMLU Medical)—an expected structural trade-off.

5. In-Depth Reasoning Example

To illustrate the Tree of Thoughts in action, consider a classic multi-system presentation with competing autoimmune diagnoses: a patient presenting with malar rash, significant proteinuria, severe arthritis, and discordant ESR/CRP values. Below is how the model actively evaluates three distinct trajectories before confidently classifying it:

Branch 1: Systemic Lupus Erythematosus (SLE)
Evaluates the malar rash, positive anti-dsDNA, and homogeneous ANA pattern. It confirms the proteinuira to likely be a proliferative lupus nephritis that dictates urgently acting via renal biopsy. The model awards this branch extremely high validity:
IR Confidence: 0.95
Branch 2: Mixed Connective Tissue Disease (MCTD)
Tests overlapping features. Evaluates that MCTD usually displays swollen hands or Raynaud phenomenon rather than a butterfly malar rash, and that an ANA homogeneous pattern favors SLE, whereas MCTD strongly requires anti-U1 RNP evaluation.
IR Confidence: 0.15
Branch 3: IgA Vasculitis / ANCA-Associated Vasculitis
Considers combination of renal issues and arthritis. Dismisses it as IgA typically presents with palpable purpura rather than malar rashes, and ANCA vasculitis is defined by p-ANCA or c-ANCA factors rather than the patient's tests.
IR Confidence: 0.08

6. Primary Use Cases

Medical Education: Serving as an interactive pedagogical tool for medical students by exposing explicit diagnostic pathways and highlighting the reasons for rejecting incorrect candidate diagnoses.
Clinical Reasoning & Differential Diagnosis: Acting as an exploratory scratchpad for clinicians to combat premature closure biases. By generating K disparate hypotheses, Actus-bio can surface rare or atypical diagnostic frames that the clinician may subsequently choose to investigate.
Multilingual Support: SAGE 2.4 Actus-bio possesses formidable capabilities parsing complex cases in Nepali, Hindi, and regional dialects directly—bringing high-accuracy support to underserved linguistic demographics in the community health worker pipeline.

7. Limitations & Failure Modes

Latency Overhead: The multi-trajectory approach requires approximately 2.4x–2.8x the compute of single-path generation at equivalent parameter counts. SAGE 2.4 Actus-bio is a deliberative reasoning tool for non-time-critical clinical consultation, not a real-time triage system.
Text-Only Modality: Actus-bio depends strictly on text transcription for findings that are fundamentally visual or audio-based (e.g., dermatological manifestations, radiology scans, heart sounds).
Hallucinations: Like all generative architectures, there remains a persistent albeit calibrated risk of factual confabulation, where it may hallucinate lab parameters that fit a narrative.

Mandatory Human Oversight & Safety Notice

SAGE 2.4 Actus-bio is NOT a clinical decision-making tool for autonomous use. This model inherently lacks physical examination capabilities, access to unwritten clinical intuition, and longitudinal patient insight. No output from this model should be acted upon in a clinical setting without rigorous review by a licensed medical professional.

Prohibited uses include: generating final diagnoses without physician authorization, executing automated prescribing, direct-to-patient triage without a human-in-the-loop, and utilizing the model in medical emergencies. All deployments enforced by SAGEA integrate a strict binding protocol for human oversight.