phnix · agent infra that survives the real run

Pure infra gives you products. Pure research gives you papers. The intersection gives you both, and that's the only place worth building.

the working thesis · 2026

blackreach 2,973 tests lethe severity 0–10 mimir phase 7/10 velqua 645 tests huginn 343 tests claude-voice ★23 corpora 675k files local-first by default

what I'm building now

flagships · actively maintained

Research · #1 focus

Lethe

An open benchmark for how AI agents degrade over long runs. Not "can it do the task." The real question is whether it still behaves correctly after a hundred steps. Four metrics fold into a single Drift Severity score, and every outcome is verified by a real shell command, never the agent's own report.

0 · stable drifting agent → 8.0 SEVERE 10

A paper, an infra tool, and a question nobody else is benchmarking. The cleanest proof of the whole thesis.

read the methodology →

Live · v5.0 · pip install

Blackreach

Every autonomous web agent I tried worked on the demo site and collapsed on anything real. Cloudflare, JS-rendered content, rate limits that return 200 OK with garbage. So I built one that doesn't. A DOM walker turns a 200k-token page into a 2k-token observation the model can reason about.

● ● ●

$ blackreach run "download all Linear A inscriptions"

✓ 847 inscriptions found · extracting…

✓ saved to /data/linear_a/ · 4m 22s

view project → github ↗

Building · Rust

Mimir

A Linux-native terminal multiplexer built for AI coding agents. Think cmux for Linux, in Rust + GTK4. It's currently growing its own terminal emulator from scratch: parser, grid, renderer, input, all pinned by golden fixtures and VTE-as-oracle differential tests.

native terminal 7 of 10 phases shipped, Phase 8 active parity

Reportedly the only VTE-based multiplexer that correctly handles synchronized output. I found it by probing the terminal, not guessing.

view project →

Live · v3.2 · pip install

Velqua

A transparent memory proxy for local LLMs. Point any Ollama app at :11435 instead of :11434 and your AI remembers who you are across sessions, models, and tools. Zero code changes. The port number is the whole integration.

any ollama app ──▶ :11435 velqua ──▶ :11434 ollama + memory injected on every request · 645 tests

Mem0 needs API calls. Zep needs an SDK. OpenMemory needs Docker, Postgres, and Qdrant. Velqua needs you to change one digit. The first link in my memory/persistence line.

view project → the mesh writeup ↗

research & corpora

where infra meets the paper

AMP Discovery

Antimicrobial-peptide discovery by fine-tuning Meta's ESM-2 protein language model with LoRA. Screened 1,980 NCBI sequences for novel antibiotic candidates.

88.3% F1 (leakage-free)650M ESM-2 + LoRA

PyTorchESM-2LoRABio

read the writeup → github ↗

RLVR / GRPO Lab

An inspectable post-training harness with verifiable math rewards and strict answer-contract evals. v1 research run complete, with paired-bootstrap evidence behind every claim.

3B & 7B runsGSM8K, full split

RLVRGRPOEvals

read the writeup → github ↗

Library of Alexandria

A curated cross-cultural mythology corpus. Verified primary sources across 69 traditions, rebuilt from 172GB of junk down to ~67GB where every download is checked. A model-training substrate.

~675k curated files69 traditions

RAGCorpusCuration

view project →

02·5