zero-rl
A compact R1-Zero-style post-training experiment for Countdown.
It trains Qwen2.5-3B with GRPO using strict programmatic rewards:
no supervised traces, no reward model, and no LLM judge.
The interesting bit is the transfer test: train on 3-number
problems, then evaluate on both 3-number and harder 4-number
prompts. The final run hit 68.46% solve rate on held-out
3-number examples on an AMD MI300X GPU, with vLLM bringing the
observed 3B step time down to roughly 8.1 seconds. It also
surfaced a clear generalization gap on 4-number transfer.
cursor-lens
A local-first CLI for understanding how people work with coding
agents. It reads Cursor transcript files, computes workflow
metrics, applies privacy filtering, and generates private Markdown
and HTML reports.
The tool keeps raw transcripts local by default, supports optional
AI-generated builder profiles, and makes agent workflows easier to
inspect through sessions, prompts, tool calls, and file references.
nanogpt-hack
A small nanoGPT-style pretraining playground built to understand
the training loop end to end: dataset prep, configs, checkpointing,
WandB logging, and controlled ablations.
Uses a 100M-token mix of FineWeb-Edu, Cosmopedia, and Python code
at roughly 30M parameters. The cleanest result was RoPE + AdamW,
improving validation loss under the same token budget.