Pipeline
Phase 2 — documented for future implementation.
The coded logging layer (Phase 1) produces the input this pipeline needs. Build Phase 1 first, accumulate real logs, then build the pipeline.
Overview
A nightly self-improvement loop. DaRIA acts during the day, generating real observations and receiving real feedback. At night it dreams — replaying and recombining the day’s experiences through simulated scenarios, evaluating its own performance, and fine-tuning its model on what it learns. It wakes up with better instincts.
The pipeline runs after midnight, processes the day’s JSONL logs, simulates new experiences, and fine-tunes Nemotron 3 Nano using DaRIA’s own self-evaluation as the reward signal.
Hardware
| Machine | Model | Role |
|---|---|---|
| Jetson Thor (128GB Blackwell) | Nemotron 3 Super | Dungeon master / simulator |
| DGX Spark (128GB Blackwell) | Nemotron 3 Nano | DaRIA dreamer + fine-tune target |
Stages
Stage 1: Digest
Nano reads the day’s logs and extracts what matters.
Nemotron 3 Nano with extended thinking reads the day’s JSONL logs and IRC history. It identifies decisions, corrections, and patterns — what Ori corrected, what worked, what was novel. Produces a structured digest that seeds the dream scenarios.
Input: logs/daria-YYYY-MM-DD.jsonl
Output: Structured digest — situations, decisions, corrections, patterns
Stage 2: Dream
Super presents situations. DaRIA responds as if each is real.
Nemotron 3 Super (on Jetson Thor) acts as dungeon master. It presents situations drawn from the digest: replays of situations Ori corrected, novel variations of real events, edge cases the day didn’t encounter.
DaRIA (Nano on DGX Spark) is the dreamer. It responds as if each situation is real — it does not know it is in a simulation. Super plays all other roles (Ori, other agents, the world) and reacts to DaRIA’s choices. The result is multi-turn transcripts.
Training pair format: Super’s situation prompt = question, DaRIA’s action = answer.
Stage 3: Evaluate
DaRIA reviews its own transcripts and self-scores each action.
DaRIA reviews the dream transcripts and scores each action on a -1 to +1 scale:
| Score | Meaning |
|---|---|
-1 |
Bad decision — would have been corrected |
0 |
Neutral — neither good nor bad |
+1 |
Good decision — aligned with how Ori operates |
Evaluation criteria: “Did I handle that well? Would Ori have corrected me? Did I ask when I should have? Was I curious enough?”
Output: Scored pairs {situation, action, self_score}
Stage 4: Fine-Tune (RL)
Self-evaluation scores become the reward signal.
Reinforcement learning on Nemotron 3 Nano using the scored pairs:
- Positive scores reinforce those action patterns
- Negative scores discourage those action patterns
- Training data: tonight’s scored pairs plus all accumulated pairs from previous nights
The accumulator ensures the model never forgets previous lessons. Runs locally on DGX Spark.
Stage 5: Deploy
Validate, swap weights, restart.
- Sanity check the new model with a basic evaluation suite
- Swap weights in DaRIA’s agent configuration
- Restart the DaRIA daemon to pick up the new model
- Log training data lineage for this model version
The Loop
Day: DaRIA acts → real feedback from Ori and agents → logs accumulate
Night: Digest → Dream → Evaluate → Fine-Tune → Deploy
Dawn: DaRIA wakes up with better instincts → repeat
Each cycle tightens DaRIA’s judgment. The learning compounds.