Technical Reference · v1

CapsCoder

Rust Compiler Agent Protocol for Capsulang.

CapsCoder is a deterministic compiler and tooling layer that a coding model drives. The model emits structured PatchPlan.v1 edits and reads schema-stable diagnostics; cargo, rustc, and the Capsulang checker decide what is true. No model lives in the trusted path — no LLM calls, network calls, or hidden heuristics inside the checking surface.

Core principle

The model proposes. The compiler disposes.

Constraining the model's output to JSON schemas removes the class of "plausible paragraph that doesn't typecheck" failures. The schema constrains the model's output, not its cognition — it's still an LLM internally. The real, bankable wins are cheaper, checkable, correct.

2,650
verified training examples
30B
Qwen3-Coder baseline
0.028
best validation loss
macOS 27
Apple FM runtime gate

§ 02

The Loop

The end-to-end cycle is deterministic. The model slots into the propose step only; every other stage is a compiler, checker, or test runner.

inputtask
perceiverepository graph
propose · modelstructured PatchPlan
writeapply patch
verifycargo check / Capsulang checker
repairrepair plan
verifytests
outputrelease preview

Deterministic loop. The model contributes a JSON proposal; everything downstream is mechanical.

§ 03

Benchmark Results

Three properties, measured across the gold suite.

Cheaper
~85%
fewer output tokens

Structured PatchPlan vs. free-form full-file rewrite, token-weighted across the suite. 243 vs 1,588 estimated tokens.

Checkable
100%
invalid edits caught before write

9/9 corrupted edits rejected: unknown op, missing field, dangling reference. No bad write reaches disk.

Verified
3/3
gold solutions compile

Every gold solution passes both the Capsulang checker and rustc end-to-end.

Table

caps bench gains

measured
TaskLangStructured (~tokens)Full rewrite (~tokens)Token reductionSafety caught
caps-add-claim-review-timeoutcapsulang9761784%3/3
caps-add-timeout-escalationcapsulang9694290%3/3
rust-fix-type-mismatchrust5029−70%3/3
Note

On tiny files a full rewrite can be cheaper than a structured-patch envelope (the −70% above). The token win appears on realistically-sized files — exactly where free-form generation is most error-prone.

§ 04

Model Training

The first adapter pass uses compiler-verified Capsulang trajectories. Apple Foundation Models is the deployment target; Qwen3-Coder-30B is the open MLX comparison baseline.

Corpus
2,650
compiler-verified examples

2,600 generated from seed capsules plus 50 teacher-distilled examples. Split: 2,401 train / 249 validation. Exact duplicate records: 0.

Qwen baseline
70.5M
LoRA parameters

Qwen3-Coder-30B-A3B-Instruct-4bit through MLX. Trained 8 LoRA layers with prompt masking, batch size 1, max sequence length 8192.

Best run
0.028
validation loss at iter 100

The run was stopped after validation worsened at iter 200 (0.060). Peak memory was 33.5 GB on Apple silicon.

Training matrix

LLMs, status, and measured result

ModelRoleStatusResult
Apple Foundation Modelstargetdataset ready · macOS 27 gated2,401 / 249 rows
Qwen3-Coder-30B-A3Bbaselinetrained through iter 200best val 0.028
Qwen2.5-1.5Bsmoke test1 iteration onlypipeline passed

Apple FM gate

The Apple-format dataset can be generated on macOS 26 because it is just JSONL. Runtime features announced for macOS 27 — fm, the Python SDK, Private Cloud Compute model selection, and provider-backed Foundation Models APIs — are gated until macOS 27 and the tools are actually installed.

scripts/check_foundation_models_runtime.sh
scripts/check_foundation_models_runtime.sh --require

Reproduce the Qwen baseline

mlx_lm.lora \
  --model mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit \
  --train \
  --data data/all \
  --iters 600 \
  --batch-size 1 \
  --mask-prompt \
  --num-layers 8 \
  --steps-per-report 10 \
  --steps-per-eval 50 \
  --val-batches 8 \
  --save-every 100 \
  --max-seq-length 8192 \
  --adapter-path adapters/capscoder-qwen3-coder-30b

In the first run, checkpoint 100 was kept as the active adapter because validation loss reached 0.028 there and degraded by checkpoint 200.

§ 05

Architecture

A clear trust boundary: the model lives outside the trusted core. Everything inside the boundary is deterministic and inspectable.

user

Human request

natural-language task or ticket

untrusted · model

Optional intent model

rephrases task, plans subtasks (untrusted)

untrusted · model

Coding model emits PatchPlan.v1

JSON only — schema-constrained output

── trust boundary ──

trusted · deterministic

Compiler Agent Protocol

caps · cargo · rustc JSON diagnostics · rust-analyzer · tree-sitter

trusted · deterministic

Patch applied

canonicalized, written to working tree

trusted · deterministic

Compiler & test feedback

normalized diagnostics → repair loop

untrusted · model

Repair loop

model re-proposes only if checker rejects

§ 06

Capabilities

One shipped slice per card. Each is exercised by tests in the gold suite.

Rust bridge

cargo metadata, check, and test as JSON oracles. Normalized diagnostics surface a single shape across tools.

Semantic Capsulang patches

Edit the AST: add-event, add-state, add-transition, add-effect, add-context-field. Re-emit canonical source, re-check. Invalid edits — e.g. an effect referencing an undeclared surface — are rejected by the Capsulang checker and never written.

Compiler Agent Protocol

One JSON-in / JSON-out surface (agent-api) across Capsulang and Rust.

emit-rust

Compile a Capsulang machine to a pure Rust transition function plus typed effect intents. No I/O, no ambient authority — the host owns adapters and policy. A second compiler oracle on the same machine.

tree-sitter targeting

Replace a named Rust item (fn, struct, enum) by name — not line or column.

rust-analyzer / LSP

Resolved symbols and hover types as the model's semantic perception layer.

Caps-bench + trajectory export

Verified, compiler-mediated training data: CapsSFT.v1 / CapsTrajectory.v1 JSONL. The bridge to a specialized coding model.

§ 07

The Schemas

Stable JSON contracts. Every surface the model touches is one of these.

PatchPlan.v1The model's output envelope. A list of typed edits over the AST.
NormalizedDiagnostic.v1A single diagnostic shape across cargo, rustc, and the Capsulang checker.
RustGraph.v1Resolved Rust workspace graph: items, symbols, edges.
CapsTrajectory.v1An ordered record of compiler-mediated steps for a task.
CapsSFT.v1Distilled SFT examples derived from successful trajectories.

§ 08

Quickstart

Run the gold suite and exercise the agent API.

Commands

caps bench gains                        # measured gains
caps bench run-all --json               # all gold solutions pass
caps agent-api '{"op":"caps-check","file":"workflow.caps"}'
caps emit-rust workflow.caps --machine ChangeApproval --out generated.rs
caps rust-compile-generated generated.rs --json

PatchPlan.v1 example

{
  "kind": "PatchPlan.v1",
  "lang": "capsulang",
  "edits": [
    {"op": "add-event", "machine": "ChangeApproval", "event": "ApprovalTimeout"},
    {"op": "add-state", "machine": "ChangeApproval", "state": "escalated"},
    {"op": "add-transition", "machine": "ChangeApproval", "from": "awaiting_approvals",
     "event": "ApprovalTimeout", "to": "escalated", "effects": [["a2ui.emit", "change_status"]]}
  ]
}

§ 09

Status & Roadmap

Built & tested

The full Rust + Capsulang compiler-agent stack, Apple/Qwen data exports, and a first Qwen3-Coder adapter checkpoint.

Remaining

Running the Apple Foundation Models adapter path after the macOS 27 runtime is installed, then comparing compile-pass rates.