~$

I build the infrastructure agents run on, and the research that shows where they break.

Self-taught in Toronto, building toward sovereign, local-first AI. Browser agents that don't fold on the real web. The first benchmark for how agents drift over long runs. A terminal grown from scratch in Rust. Memory that persists. When something breaks, it becomes the next paper. And everything is named after the right myths.

seeking
ml / agent-infra role
basis
remote · open to us relocation
stack
python · rust · pytorch
status
available now
2,973 tests on blackreach alone
20+ shipped repos & tools
RTX 4060 the local-first daily driver
130GB+ curated training corpora
Pure infra gives you products. Pure research gives you papers. The intersection gives you both, and that's the only place worth building.
the working thesis · 2026
01

what I'm building now

flagships · actively maintained
Research · #1 focus

Lethe

An open benchmark for how AI agents degrade over long runs. Not "can it do the task." The real question is whether it still behaves correctly after a hundred steps. Four metrics fold into a single Drift Severity score, and every outcome is verified by a real shell command, never the agent's own report.

0 · stable drifting agent → 8.0 SEVERE 10

A paper, an infra tool, and a question nobody else is benchmarking. The cleanest proof of the whole thesis.

Live · v5.0 · pip install

Blackreach

Every autonomous web agent I tried worked on the demo site and collapsed on anything real. Cloudflare, JS-rendered content, rate limits that return 200 OK with garbage. So I built one that doesn't. A DOM walker turns a 200k-token page into a 2k-token observation the model can reason about.

● ● ●
$ blackreach run "download all Linear A inscriptions"
847 inscriptions found · extracting…
saved to /data/linear_a/ · 4m 22s
Building · Rust

Mimir

A Linux-native terminal multiplexer built for AI coding agents. Think cmux for Linux, in Rust + GTK4. It's currently growing its own terminal emulator from scratch: parser, grid, renderer, input, all pinned by golden fixtures and VTE-as-oracle differential tests.

native terminal 7 of 10 phases shipped, Phase 8 active parity

Reportedly the only VTE-based multiplexer that correctly handles synchronized output. I found it by probing the terminal, not guessing.

Live · v3.2 · pip install

Velqua

A transparent memory proxy for local LLMs. Point any Ollama app at :11435 instead of :11434 and your AI remembers who you are across sessions, models, and tools. Zero code changes. The port number is the whole integration.

any ollama app ──▶ :11435 velqua ──▶ :11434 ollama + memory injected on every request · 645 tests

Mem0 needs API calls. Zep needs an SDK. OpenMemory needs Docker, Postgres, and Qdrant. Velqua needs you to change one digit. The first link in my memory/persistence line.

02

research & corpora

where infra meets the paper
AMP Discovery

Antimicrobial-peptide discovery by fine-tuning Meta's ESM-2 protein language model with LoRA. Screened 1,980 NCBI sequences for novel antibiotic candidates.

88.3% F1 (leakage-free)650M ESM-2 + LoRA
PyTorchESM-2LoRABio
RLVR / GRPO Lab

An inspectable post-training harness with verifiable math rewards and strict answer-contract evals. v1 research run complete, with paired-bootstrap evidence behind every claim.

3B & 7B runsGSM8K, full split
RLVRGRPOEvals
Library of Alexandria

A curated cross-cultural mythology corpus. Verified primary sources across 69 traditions, rebuilt from 172GB of junk down to ~67GB where every download is checked. A model-training substrate.

~675k curated files69 traditions
RAGCorpusCuration
02·5

the constellation

everything connects · hover a star
Lethe Rigr Mimir Blackreach Huginn StarSearch MCP servers Velqua Loki AMP Discovery RLVR / GRPO Library of Alexandria ProjectMythos
hover a star to trace the lineage. Click to open the project.
the archive · earlier eras, kept honest

Before this, there was a multi-agent era: Orchestrator (autonomous agent loops), Sable (self-improving reasoning agent), Project Anima (neuroscience-grounded emotional architecture), and a swarm of coordinated agents. Some of it shipped, some of it taught me what not to build. The Bifrost / Heimdall ML line, structured prediction over historical texts, still informs the research thread. Nothing here is pretending; the dead ends are part of the record.

Let's build something that survives the real run.

Self-taught builder-researcher. I ship agent infrastructure that survives the real run, and I publish the failures that teach me where the edges are. If you're working on something that has to coordinate agents, survive the adversarial web, or run without a cloud dependency, that is exactly the area.

Autonomous Agents Agent Evaluation LLM Infrastructure RAG Pipelines MCP Servers Python Rust TypeScript PyTorch Playwright
contact@phnix.dev résumé ↓
toronto · remote / relocation
contract or full-time
usd / cad