Coconut — next steps

Open questions

Unresolved by design intent — bring these to the design session. Merged from the v0.4 handoff and the original brief.

Is there already an internal name for the cross-surface cursor?

Cowork’s “co-worker” framing suggests something cousin-like may already exist. Worth checking before naming it ourselves.

HANDOFF §10.1 · brief §11.1

How does this compose with Artifacts / Visualizer — sibling or convergent surface?

If artifacts are a persistent shared canvas, is the cursor the navigation layer for them, or a separate concept entirely?

HANDOFF §10.2 · brief §11.2

OS-level primitive vs. add-in guest — does Cowork become the anchor?

The cursor wants to be OS-level, but Anthropic’s add-ins are guests in Microsoft apps. Cowork could be the anchor where the primitive lives in full form, with degraded versions in each add-in.

HANDOFF §10.3 · brief §11.3

What is the eval story for held-rate calibration?

Calibration is a model-side problem. If Claude is poorly calibrated, the held “I don’t know” state becomes either noise or a lie. The UI can only be as honest as the held-rate underneath it.

HANDOFF §10.4 · brief §11.4

What does the minimum viable memory model look like?

A cursor that knows everything is also surveillant — the “second brain” trap. Memory and quiet mode must ship together. What is the smallest memory model that serves the prototype without overreaching?

HANDOFF §10.5 · brief §11.5

How do skills compose with the cursor?

A skill is a stored verb; the cursor is the surface that carries verbs. They should compose — but the contract isn’t obvious. Worth a whiteboard.

HANDOFF §10.6 · brief §11.6

What is the smallest research prototype worth running?

Lane 1 (Cowork) is the demo we have. Is there a still-smaller user-study version — the held state in a single file, A/B against a chat-shaped equivalent? (See the callout below.)

HANDOFF §10.7 · brief §11.7

What to push on at hi-fi

The next fidelity step. These are the details where the difference between “prototype” and “real” actually lives.

Cursor motion fidelity.

The slowed Held motion is the entire concept in one gesture. Test at 144Hz. Try an underdamped spring instead of a critically-damped lerp.

HANDOFF §9.1

The “?” badge.

Currently Inter 700 inside a slate disc. Try a serif glyph, a hand-drawn glyph, an animated wobble — should it breathe?

HANDOFF §9.2

The held viewer as a hole in the document.

Currently a dashed-bordered hatched card. Could it instead be an actual gap where the draft would be — a hole in the document? Worth a sketch.

HANDOFF §9.3

Banner hierarchy.

Three modes — return, working, complete. Each should feel distinct without a layout change.

HANDOFF §9.4

Session-level quiet mode: native vs. branded cursor.

Right now the palette dims and the cursor reverts. Should the whole UI dim? Should the cursor stay branded or go fully native? Probably native.

HANDOFF §9.5

Chip tone calibration.

Re-read every chip. Avoid “I can help you with…” / “Sorry, I can’t…” — go for active, posture-explicit phrasing.

HANDOFF §9.6

The session log as a dependency graph.

Cowork is spatial. The chat-style log might want to be a dependency graph of files instead.

HANDOFF §9.7

Markup colors wired to the picker.

The cursor widget exposes four user colors for markup (cyan, green, magenta, coral) that aren’t wired to actual markup yet. Either ship the picker or hide it.

HANDOFF §9.8

Voice waveform reacting to real amplitude.

Currently a six-bar CSS animation. It should respond to real audio amplitude when the mic is hot.

HANDOFF §9.9

Milestones

The Seeing States build ladder. Order matters — each milestone is shippable in isolation, with a single “done when” gate.

M1

Eyes on

Done when. Starting a share renders the halo within 1s and the pill cycles labels correctly.
M2

I see this

Done when. A user shares a doc and the model says “I see {title}.”
M3

Failures + min-dwell

Done when. Every entry in the failure table surfaces in a fixture test.
M4

Attention modes

Done when. A manual click on an element promotes to track; two low-confidence reads promote to stare.
M5

Mode topology + barge-in

Done when. The chaos-test (random user-pointer events while the agent is driving) never leaves the agent stuck holding the pointer.
M6

Macro record + replay

Done when. A user records a 5-step macro, replays it, and the replay halts at divergence point #2 when the UI changes.
M7

Guide artifact

Done when. A recorded macro can be approved and shared as a link.
M8

Privacy

Done when. A frame containing a credit card never reaches the model.

SPEC §14 (M1–M8)

The smallest research prototype worth running

The held state in a single file, A/B against a chat-shaped equivalent. Skip the five-scenario player and the cross-surface continuity; isolate the one gesture the whole concept rests on — a cursor that visibly holds rather than generating plausible text — and measure whether users trust it more than a chat reply that fills the same gap. If the held state doesn’t win that test, nothing built on top of it will.

HANDOFF §9 (smallest prototype, §10.7) · brief §11.7