01 · Cursor

Posture states

Four postures — confident, partial, held (“I don’t know”), and refused — encode the model’s stance toward a target in color, halo, arrow, badge, and motion. Held is the hero: the cursor physically slows (its easing lerp drops from 0.32 to 0.055) so it lags your hand rather than degrading into plausible text.

Posture Color Halo Arrow Badge Motion Chip header
Confident --coral solid concentric ring white, full opacity standard 180ms ease-out CONFIDENT · click to accept
Partial --blue dashed ring + corner brackets white, full opacity tick marks at NW & SE standard 180ms ease-out PARTIAL · verify before sending
Held (I don't know) --slate sparse dotted ring white at 70% opacity "?" disc top-right slowed ~600ms cubic-bezier(0.2, 0.9, 0.3, 1) I DON'T KNOW · needs your input
Refused (Quiet mode) --red solid ring + diagonal slash white, full opacity standard 180ms QUIET MODE · this region is off-limits
Click to accept
CONFIDENT
click to accept
Verify before sending
PARTIAL
verify before sending
Needs your input
I DON'T KNOW
needs your input
This region is off-limits
QUIET MODE
this region is off-limits

Move your pointer across the tiles. Notice the held cursor physically slows.

02 · Cursor

Four interaction primitives

These compose. The cursor at any moment is the product of all four being on or off.

A
Propose-and-wait
The cursor never acts unilaterally. Every action is a halo, a highlighted cell, a tracked change, a drafted email. The user accepts by following — action is a co-signature.
B
Show work at the tip
Every proposal carries provenance reachable from the cursor. Click → expand. Click again → walk back through the source chain. The cursor is a portal.
C
Calibrated states
Posture, not percentages. Shape and motion encode confidence — no “I’m 73% sure” labels.
D
Quiet mode
Triggered by sensitivity labels, focus sessions, or uncertainty cascades. File-level dims the cursor; session-level reverts it to the OS native arrow and dims the palette.

03 · Vision

Perception lifecycle

One perception loop, six phases (L-00…L-05). Each phase has a canonical pill label and a shown surface.

CodePillSurface
L-00 Copilot — off eyes-off
L-01 Copilot — looking halo · steady
L-02 Reading… halo · scan-shimmer + chip
L-03 I see this region box + claim chip
L-04 Thinking… faded thought chips · no UI change
L-05 Responding banner + suggested-next region

04 · Vision

Attention modes

Five strategies for when and how to capture frames and run the pipeline. Cost rises with fidelity; the runtime auto-promotes (watch → track on a click, track → stare on two low-confs) and cools back down.

ModePillCostCadence
glance Glance Cost 1 Capture once on demand, discard buffer after pipeline.
watch Watching Cost 2 Capture at config.glanceHz (default ~2 Hz). Region = full viewport.
track Tracking amount Cost 2 Lock onto a layout node. Recapture when sceneDelta > changeThreshold.
stare Staring · header Cost 4 Multi-pass high-res burst on a fixed region. Pause Watch during.
co-attend Following you Cost 2 Recapture on user-pointer; region = bbox under cursor, +80px.

05 · Vision

Mode topology

Nine modes (M-00…M-08) define who is driving and who is interpreting. The pointer is sacred: any user-pointer event while the agent drives (co-pilot, controlling, demo) cancels the in-flight action and falls back to sharing within ~120ms. M-08 monitoring is the watch-for addition, reachable from sharing and coaching.

CodeModeLegal next modes
M-00 Idle M-01 Sharing
M-01 Sharing M-02 CoachingM-03 Macro recordM-04 Co-pilotM-00 IdleM-08 Monitoring
M-02 Coaching M-01 SharingM-03 Macro recordM-04 Co-pilotM-08 Monitoring
M-03 Macro record M-01 SharingM-02 CoachingM-04 Co-pilotM-07 Guide build
M-04 Co-pilot M-05 ControllingM-01 Sharing
M-05 Controlling M-06 Demo / replayM-01 Sharing
M-06 Demo / replay M-07 Guide buildM-01 Sharing
M-07 Guide build M-00 IdleM-01 Sharing
M-08 Monitoring M-01 SharingM-02 Coaching

06 · Vision

Intent vocabulary

Eight verbs (V-01…V-08), orthogonal to mode and attention. A verb is the user-facing handle that triggers a mode transition; the topology still owns who drives.

VerbLabelMeaning
V-01-snapshot Snapshot One-shot look (replaces the ambiguous “share start”).
V-02-explain Explain Explain what’s pointed at / on screen now.
V-03-guide Guide Multi-step walkthrough with an on-screen arrow.
V-04-compare Compare/Synthesize Synthesize across windows / third-party docs.
V-05-verify Verify “Did I do this right?”
V-06-watch-for Watch-for Persistent monitor with a trigger — invokes M-08 monitoring.
V-07-capture-world Capture world Camera input.
V-08-auto-witness Auto-witness Implicit macro-record.

07 · Shared

Shown-surface taxonomy

The curated set of surfaces the grammar is allowed to render. Each overlays the shared content only — never the meeting chrome (except the always-present pill).

Pill
Always-present status label in the meeting chrome — the canonical UI; its text drives the state machine. (SPEC §5)
Halo
Inset border + outer glow over the watched content; intensity and dashing encode posture. (SPEC §5)
Region box
A dashed-or-solid rectangle in screen coords marking the element in focus. (SPEC §5)
Chip
A small anchored claim — “I see {label}” — auto-positioned to stay in the viewport; the no-silent-failure floor. (SPEC §5)
Banner
Full-width announcement: “I’ll drive for a moment.” / “Back to you.” One banner max. (SPEC §5)
Modal
Takes pointer focus for disambiguation, macro abstraction, or divergence; pointer barge-in dismisses it. (SPEC §5)
Mask
Checkered redaction applied before OCR — masked pixels never leave the device. (SPEC §5)
Ghost cursor
A faded mirror of the user’s cursor, rendered while the agent follows. (SPEC §5)
Agent cursor
The agent’s own cursor while driving, carrying a “copilot” label. (SPEC §5)
Step rail
A right-edge panel listing macro steps during record / replay. (SPEC §5)
Palette
The ~380px companion docked beside the host app — voice channel, cursor-mode controls, running step-tracker. (HANDOFF §3)
Screen markup
The agent draws on the host UI — callout boxes (per-state borders), pointers, and a Caveat script label; anchored by selector, not pixels. (HANDOFF §4)

08 · Shared

Visual tokens

The materials. State backgrounds are tinted versions of their accent at 5–18% alpha; borders are thin (1.5px), full alpha.

Color

--paper #faf7f2 warm cream background
--paper-2 #f4ede1 subtle bg accent (panels)
--paper-3 #ece2cc deeper warmth (return banner)
--paper-4 #e5d9bf
--rule #e2d8c4 hairlines
--rule-strong #c9bca0 readable borders
--ink #1c1a17 body
--ink-soft #4a4639 secondary
--ink-faint #8a8474 metadata
--coral #d97757 CONFIDENT (Claude warm accent)
--blue #4a6da7 PARTIAL
--slate #6b7280 HELD / I don’t know
--red #b54545 REFUSED / QUIET MODE
--green #5a8a5e working indicator

Type ramp

The honest cursor holds still. Newsreader — serif — body prose, file titles, headings, lede
The honest cursor holds still. Inter — sans — UI chrome (toolbar, chips, labels, palette)
The honest cursor holds still. JetBrains Mono — eyebrows, state pills, filenames, timestamps
The honest cursor holds still. Caveat — handwriting — markup labels / accent moments

Motion

standard cursor follow transform 180ms ease-out (lerp factor 0.32)
HELD cursor follow transform 600ms cubic-bezier(0.2, 0.9, 0.3, 1) (lerp factor 0.055)
file state flash on update background 1500ms ease-out (coral → white)
banner pulse box-shadow 1400ms ease-out infinite (green dot)
cascade stagger 600 / 1500 / 2400ms from Resume click