CapsCoder
Rust Compiler Agent Protocol for Capsulang.
CapsCoder is a deterministic compiler and tooling layer that a coding model drives. The model emits structured PatchPlan.v1 edits and reads schema-stable diagnostics; cargo, rustc, and the Capsulang checker decide what is true. No model lives in the trusted path — no LLM calls, network calls, or hidden heuristics inside the checking surface.
Core principle
The model proposes. The compiler disposes.
Constraining the model's output to JSON schemas removes the class of "plausible paragraph that doesn't typecheck" failures. The schema constrains the model's output, not its cognition — it's still an LLM internally. The real, bankable wins are cheaper, checkable, correct.
§ 02
The Loop
The end-to-end cycle is deterministic. The model slots into the propose step only; every other stage is a compiler, checker, or test runner.
Deterministic loop. The model contributes a JSON proposal; everything downstream is mechanical.
§ 03
Benchmark Results
Three properties, measured across the gold suite.
Structured PatchPlan vs. free-form full-file rewrite, token-weighted across the suite. 243 vs 1,588 estimated tokens.
9/9 corrupted edits rejected: unknown op, missing field, dangling reference. No bad write reaches disk.
Every gold solution passes both the Capsulang checker and rustc end-to-end.
Table
caps bench gains
| Task | Lang | Structured (~tokens) | Full rewrite (~tokens) | Token reduction | Safety caught |
|---|---|---|---|---|---|
| caps-add-claim-review-timeout | capsulang | 97 | 617 | 84% | 3/3 |
| caps-add-timeout-escalation | capsulang | 96 | 942 | 90% | 3/3 |
| rust-fix-type-mismatch | rust | 50 | 29 | −70% | 3/3 |
On tiny files a full rewrite can be cheaper than a structured-patch envelope (the −70% above). The token win appears on realistically-sized files — exactly where free-form generation is most error-prone.
§ 04
Model Training
The first adapter pass uses compiler-verified Capsulang trajectories. Apple Foundation Models is the deployment target; Qwen3-Coder-30B is the open MLX comparison baseline.
2,600 generated from seed capsules plus 50 teacher-distilled examples. Split: 2,401 train / 249 validation. Exact duplicate records: 0.
Qwen3-Coder-30B-A3B-Instruct-4bit through MLX. Trained 8 LoRA layers with prompt masking, batch size 1, max sequence length 8192.
The run was stopped after validation worsened at iter 200 (0.060). Peak memory was 33.5 GB on Apple silicon.
Training matrix
LLMs, status, and measured result
| Model | Role | Status | Result |
|---|---|---|---|
| Apple Foundation Models | target | dataset ready · macOS 27 gated | 2,401 / 249 rows |
| Qwen3-Coder-30B-A3B | baseline | trained through iter 200 | best val 0.028 |
| Qwen2.5-1.5B | smoke test | 1 iteration only | pipeline passed |
Apple FM gate
The Apple-format dataset can be generated on macOS 26 because it is just JSONL. Runtime features announced for macOS 27 — fm, the Python SDK, Private Cloud Compute model selection, and provider-backed Foundation Models APIs — are gated until macOS 27 and the tools are actually installed.
scripts/check_foundation_models_runtime.sh
scripts/check_foundation_models_runtime.sh --requireReproduce the Qwen baseline
mlx_lm.lora \
--model mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit \
--train \
--data data/all \
--iters 600 \
--batch-size 1 \
--mask-prompt \
--num-layers 8 \
--steps-per-report 10 \
--steps-per-eval 50 \
--val-batches 8 \
--save-every 100 \
--max-seq-length 8192 \
--adapter-path adapters/capscoder-qwen3-coder-30bIn the first run, checkpoint 100 was kept as the active adapter because validation loss reached 0.028 there and degraded by checkpoint 200.
§ 05
Architecture
A clear trust boundary: the model lives outside the trusted core. Everything inside the boundary is deterministic and inspectable.
user
Human request
natural-language task or ticket
untrusted · model
Optional intent model
rephrases task, plans subtasks (untrusted)
untrusted · model
Coding model emits PatchPlan.v1
JSON only — schema-constrained output
── trust boundary ──
trusted · deterministic
Compiler Agent Protocol
caps · cargo · rustc JSON diagnostics · rust-analyzer · tree-sitter
trusted · deterministic
Patch applied
canonicalized, written to working tree
trusted · deterministic
Compiler & test feedback
normalized diagnostics → repair loop
untrusted · model
Repair loop
model re-proposes only if checker rejects
§ 06
Capabilities
One shipped slice per card. Each is exercised by tests in the gold suite.
Rust bridge
cargo metadata, check, and test as JSON oracles. Normalized diagnostics surface a single shape across tools.
Semantic Capsulang patches
Edit the AST: add-event, add-state, add-transition, add-effect, add-context-field. Re-emit canonical source, re-check. Invalid edits — e.g. an effect referencing an undeclared surface — are rejected by the Capsulang checker and never written.
Compiler Agent Protocol
One JSON-in / JSON-out surface (agent-api) across Capsulang and Rust.
emit-rust
Compile a Capsulang machine to a pure Rust transition function plus typed effect intents. No I/O, no ambient authority — the host owns adapters and policy. A second compiler oracle on the same machine.
tree-sitter targeting
Replace a named Rust item (fn, struct, enum) by name — not line or column.
rust-analyzer / LSP
Resolved symbols and hover types as the model's semantic perception layer.
Caps-bench + trajectory export
Verified, compiler-mediated training data: CapsSFT.v1 / CapsTrajectory.v1 JSONL. The bridge to a specialized coding model.
§ 07
The Schemas
Stable JSON contracts. Every surface the model touches is one of these.
§ 08
Quickstart
Run the gold suite and exercise the agent API.
Commands
caps bench gains # measured gains
caps bench run-all --json # all gold solutions pass
caps agent-api '{"op":"caps-check","file":"workflow.caps"}'
caps emit-rust workflow.caps --machine ChangeApproval --out generated.rs
caps rust-compile-generated generated.rs --jsonPatchPlan.v1 example
{
"kind": "PatchPlan.v1",
"lang": "capsulang",
"edits": [
{"op": "add-event", "machine": "ChangeApproval", "event": "ApprovalTimeout"},
{"op": "add-state", "machine": "ChangeApproval", "state": "escalated"},
{"op": "add-transition", "machine": "ChangeApproval", "from": "awaiting_approvals",
"event": "ApprovalTimeout", "to": "escalated", "effects": [["a2ui.emit", "change_status"]]}
]
}§ 09
Status & Roadmap
Built & tested
The full Rust + Capsulang compiler-agent stack, Apple/Qwen data exports, and a first Qwen3-Coder adapter checkpoint.
Remaining
Running the Apple Foundation Models adapter path after the macOS 27 runtime is installed, then comparing compile-pass rates.