projects · isha garg

zero-rl

A compact R1-Zero-style post-training experiment for Countdown. It trains Qwen2.5-3B with GRPO using strict programmatic rewards: no supervised traces, no reward model, and no LLM judge.

The interesting bit is the transfer test: train on 3-number problems, then evaluate on both 3-number and harder 4-number prompts. The final run hit 68.46% solve rate on held-out 3-number examples on an AMD MI300X GPU, with vLLM bringing the observed 3B step time down to roughly 8.1 seconds. It also surfaced a clear generalization gap on 4-number transfer.

Python·GRPO·Qwen2.5

repo

concise-CoT

Length-controlled reasoning distillation from Qwen3-32B into a compact Qwen3-4B LoRA student. It generates verified GSM8K teacher traces, rewrites them into L1/L2/L3 reasoning budgets, re-verifies correctness, and trains one student to follow those budgets.

The fixed vLLM LoRA eval maps the accuracy-vs-token Pareto curve: L0 reaches 93.6% accuracy, while L2 is the best compressed point at 88.8% accuracy with about 99 generated tokens. Structural analysis suggests compression removes prose-like reasoning before calculation-like steps.

Python·LoRA SFT·vLLM

repo

cursor-lens

A local-first CLI for understanding how people work with coding agents. It reads Cursor transcript files, computes workflow metrics, applies privacy filtering, and generates private Markdown and HTML reports.

The tool keeps raw transcripts local by default, supports optional AI-generated builder profiles, and makes agent workflows easier to inspect through sessions, prompts, tool calls, and file references.

TypeScript·CLI·privacy

repo

nanogpt-hack

A small nanoGPT-style pretraining playground built to understand the training loop end to end: dataset prep, configs, checkpointing, WandB logging, and controlled ablations.

Uses a 100M-token mix of FineWeb-Edu, Cosmopedia, and Python code at roughly 30M parameters. The cleanest result was RoPE + AdamW, improving validation loss under the same token budget.

Python·pretraining·RoPE

repo