The converged vocabulary
The grammar
One vocabulary, drawn from the Honest Cursor handoff and the Seeing States spec. Posture states and interaction primitives travel with the cursor; the lifecycle, attention modes, mode topology, and intent verbs govern the screen-vision runtime. The shown surfaces and visual tokens are the shared materials.
01 · Cursor
Posture states
Four postures — confident, partial, held (“I don’t know”), and refused — encode the model’s stance toward a target in color, halo, arrow, badge, and motion. Held is the hero: the cursor physically slows (its easing lerp drops from 0.32 to 0.055) so it lags your hand rather than degrading into plausible text.
| Posture | Color | Halo | Arrow | Badge | Motion | Chip header |
|---|---|---|---|---|---|---|
| Confident | --coral | solid concentric ring | white, full opacity | — | standard 180ms ease-out | CONFIDENT · click to accept |
| Partial | --blue | dashed ring + corner brackets | white, full opacity | tick marks at NW & SE | standard 180ms ease-out | PARTIAL · verify before sending |
| Held (I don't know) | --slate | sparse dotted ring | white at 70% opacity | "?" disc top-right | slowed ~600ms cubic-bezier(0.2, 0.9, 0.3, 1) | I DON'T KNOW · needs your input |
| Refused (Quiet mode) | --red | solid ring + diagonal slash | white, full opacity | — | standard 180ms | QUIET MODE · this region is off-limits |
Move your pointer across the tiles. Notice the held cursor physically slows.
02 · Cursor
Four interaction primitives
These compose. The cursor at any moment is the product of all four being on or off.
03 · Vision
Perception lifecycle
One perception loop, six phases (L-00…L-05). Each phase has a canonical pill label and a shown surface.
| Code | Pill | Surface |
|---|---|---|
L-00 | Copilot — off | eyes-off |
L-01 | Copilot — looking | halo · steady |
L-02 | Reading… | halo · scan-shimmer + chip |
L-03 | I see this | region box + claim chip |
L-04 | Thinking… | faded thought chips · no UI change |
L-05 | Responding | banner + suggested-next region |
04 · Vision
Attention modes
Five strategies for when and how to capture frames and run the pipeline. Cost rises with fidelity; the runtime auto-promotes (watch → track on a click, track → stare on two low-confs) and cools back down.
| Mode | Pill | Cost | Cadence |
|---|---|---|---|
glance | Glance | Cost 1 | Capture once on demand, discard buffer after pipeline. |
watch | Watching | Cost 2 | Capture at config.glanceHz (default ~2 Hz). Region = full viewport. |
track | Tracking amount | Cost 2 | Lock onto a layout node. Recapture when sceneDelta > changeThreshold. |
stare | Staring · header | Cost 4 | Multi-pass high-res burst on a fixed region. Pause Watch during. |
co-attend | Following you | Cost 2 | Recapture on user-pointer; region = bbox under cursor, +80px. |
05 · Vision
Mode topology
Nine modes (M-00…M-08) define who is driving and who is interpreting. The pointer is sacred: any user-pointer event while the agent drives (co-pilot, controlling, demo) cancels the in-flight action and falls back to sharing within ~120ms. M-08 monitoring is the watch-for addition, reachable from sharing and coaching.
| Code | Mode | Legal next modes |
|---|---|---|
M-00 | Idle | M-01 Sharing |
M-01 | Sharing | M-02 CoachingM-03 Macro recordM-04 Co-pilotM-00 IdleM-08 Monitoring |
M-02 | Coaching | M-01 SharingM-03 Macro recordM-04 Co-pilotM-08 Monitoring |
M-03 | Macro record | M-01 SharingM-02 CoachingM-04 Co-pilotM-07 Guide build |
M-04 | Co-pilot | M-05 ControllingM-01 Sharing |
M-05 | Controlling | M-06 Demo / replayM-01 Sharing |
M-06 | Demo / replay | M-07 Guide buildM-01 Sharing |
M-07 | Guide build | M-00 IdleM-01 Sharing |
M-08 | Monitoring | M-01 SharingM-02 Coaching |
06 · Vision
Intent vocabulary
Eight verbs (V-01…V-08), orthogonal to mode and attention. A verb is the user-facing handle that triggers a mode transition; the topology still owns who drives.
| Verb | Label | Meaning |
|---|---|---|
V-01-snapshot | Snapshot | One-shot look (replaces the ambiguous “share start”). |
V-02-explain | Explain | Explain what’s pointed at / on screen now. |
V-03-guide | Guide | Multi-step walkthrough with an on-screen arrow. |
V-04-compare | Compare/Synthesize | Synthesize across windows / third-party docs. |
V-05-verify | Verify | “Did I do this right?” |
V-06-watch-for | Watch-for | Persistent monitor with a trigger — invokes M-08 monitoring. |
V-07-capture-world | Capture world | Camera input. |
V-08-auto-witness | Auto-witness | Implicit macro-record. |
07 · Shared
Shown-surface taxonomy
The curated set of surfaces the grammar is allowed to render. Each overlays the shared content only — never the meeting chrome (except the always-present pill).
08 · Shared
Visual tokens
The materials. State backgrounds are tinted versions of their accent at 5–18% alpha; borders are thin (1.5px), full alpha.
Color
--paper #faf7f2 warm cream background --paper-2 #f4ede1 subtle bg accent (panels) --paper-3 #ece2cc deeper warmth (return banner) --paper-4 #e5d9bf --rule #e2d8c4 hairlines --rule-strong #c9bca0 readable borders --ink #1c1a17 body --ink-soft #4a4639 secondary --ink-faint #8a8474 metadata --coral #d97757 CONFIDENT (Claude warm accent) --blue #4a6da7 PARTIAL --slate #6b7280 HELD / I don’t know --red #b54545 REFUSED / QUIET MODE --green #5a8a5e working indicator Type ramp
Motion
| standard cursor follow | transform 180ms ease-out (lerp factor 0.32) |
|---|---|
| HELD cursor follow | transform 600ms cubic-bezier(0.2, 0.9, 0.3, 1) (lerp factor 0.055) |
| file state flash on update | background 1500ms ease-out (coral → white) |
| banner pulse | box-shadow 1400ms ease-out infinite (green dot) |
| cascade stagger | 600 / 1500 / 2400ms from Resume click |