MEETING SELECT // DAY ONE OF THE TRACK

AIPOWERHOUSE

HACKATHON KICKOFF

One day. One real ticket from our own backlog. You run the whole lifecycle — SPECIFY → GENERATE → COMPREHEND — with an agent doing the typing.

RULE OF THE DAYShort talks, long labs. If I speak for more than fifteen minutes, something has gone wrong.

→ to begin · T starts lab timers

Morning

09:30TALKFrame — the sandwich, the day, the spotter's card

09:55TALKDrive — the Claude Code driving lesson

10:05LABLab 1 · First contact — calibrate on our own codebase

10:40TALKDebrief — the agentic loop · coffee

10:55TALKSpecify — prompting, grilling, checkable done

11:10LABLab 2 · Plan & grill — your ticket becomes a plan

11:55TALKDebrief — the seam

12:10BREAKLunch

Afternoon

13:00TALKGenerate — let it run, when to step in

13:10LABLab 3 · Build it — the agent types, you supervise

14:10BREAKDebrief + coffee — who intervened, and why

14:30TALKComprehend — expectation-first review

14:40LABLab 4 · Prove it, then review it — evidence, diff, cross-review

15:30LABLab 5 · Teach the factory — CLAUDE.md, today's lessons kept

16:00LABDemo circle — show the catch, not just the ship

16:40TALKClose — the factory · the road ahead

Timeboxes are hard; done-when beats done-everything. Whatever state your lab is in when time runs out, that's what we debrief.

The model K

Every piece of work is a sandwich. You are the bread on both ends; the AI is the filling — the only loop you follow today.

Specify — you say what you want and what the rules are. A clear spec means less guessing.
Generate — the agent does the work. It writes code, tests, and edits.
Comprehend — you read it back, ask questions, and decide. This is where quality is set.
Skip Comprehend and you did not save time. You just pushed the work to later.

Metaphor: Dan Shipper, Every's podcast “AI & I”, with Kieran Klaassen. The Specify→Generate→Comprehend framing is this engagement's own adaptation.

YOU · BREADSPECIFYsay what you want, and the rules

THE AGENT · FILLINGGENERATEcode, tests, edits — the typing

YOU · BREADCOMPREHENDread it back, question it, decide

What changes for you

REVIEW 15%

WRITING CODE 70%

SPECIFY 15%

Before

→

COMPREHEND 40%

CODE 20%

SPECIFY 40%

Toward

You write less and less code over time. The time does not vanish — it moves to the two human ends.

!

Comprehension debt

Code that ships, but nobody understands. The cost hits the first time it breaks.

!

Orchestration ceiling

One reviewer behind many agents. Past your limit, quality quietly drops.

BLOCK 3 / 6

Agree the what before anything builds.

The front of the sandwich. Vague in, vague out — judgment goes in here, while changing your mind is still cheap. Unclear requirements are this company's single biggest source of delay; this block is the antidote.

03

Block 2 · Drive — the driving lesson S

Your tool for the day: a session, your repo, and control over what it may do.

Start it in the repo root. Now it can see your real code.
Permissions: it asks before acting, until you choose to let it run.
Talk to it in plain language. Point at real files. No magic words.
Esc interrupts at any time. You are always the one in charge.

you@meetingselect : ~/platform

LAB 2 · HANDS ON · 45 MIN

Your ticket becomes a plan so clear someone else could build it.

[1]

Make it interview youpaste the grill prompt; answer at least five rounds

[2]

Get the plan + done-when list3–5 checks that can pass or fail

[3]

Swap plans with your pairmark every spot where you'd have to guess

[4]

Fix the vague spotsuntil your pair signs off

Stretch: two competing approaches from the agent; one sentence on why you chose yours.

LAB TIMER

45:00

T START / PAUSE · R RESET

You're done when

Your pair says: “I could build this without asking you anything.”

⛔ HOLD POINT — nobody builds before lunch. That urge you feel right now is what today is about.

⏸ Debrief · what you just watched

It read, it ran, it looked at the result, and it went again. That loop is the difference between an agent and a chatbot.

Each step is a tool call. The result decides the next step.
It acts — it doesn't just answer. That's what makes it wave 2.
The loop stops when the goal is met, or when it needs you.

PROMPT→ TOOL CALL→ RESULT→ REPEAT ⟳

agent — live loop

agent — session 02:41 · long run, many files

›

CONTEXT 34%

⏸ Intermezzo · taught when it happens

Long session, many files — and the answers went vague. The context window is nearly full. This is normal; now you know its name.

The agent's short-term memory has a hard limit. Old details fall off or blur.
The fix: compact the session, start fresh, or write the state to a file first.
Long runs are managed, not endured. This is half of what supervision means.

D replays the demo

The model, at scale

Many agents can work at the same time. But only one person reviews. That is the slow part.

Agents spread out. Many tasks generate at the same time.
Human review is the one slow step. It sets how fast the factory really goes.

Run more agents than you can read and you don't go faster — you just approve without checking.

htop — the factory · one human on shift

AGENTS0 / 12 running

REVIEW1 human · 100% · SATURATED

PID USER CPU% ST COMMAND

Today was the whole sandwich, fast. The next sessions slow it down — each one takes a station you ran today and builds it properly, with homework on real tickets in between.

S1/Planning & refiningThe front of the sandwich, done right: codebase archaeology, codifying tacit knowledge, the grill, the seam.

S2/Building the machineParallel agents, autonomous routines, self-reviewing pipelines, guardrails — the factory itself.

S3/Reviewing codeKeeping your grip as volume rises: architectural review, agents in the browser, blocking the rubber stamp.

S4/Orchestration & judgmentThe capstone: your parallel ceiling, backpressure, and the judgment no agent replaces.

Everyone drove an agent through Specify → Generate → Comprehend on our own code — and the lessons you wrote into CLAUDE.md are still here tomorrow. That loop, repeated, is the factory.

The harder parts — many agents at once, autonomous routines, real review at volume — come in the next sessions. We are not learning a tool. We are building a factory the team owns. Today was day one.

BLOCK 1 / 6

25 minutes of theory — the only long talk of the day.

Six blocks. This is the only one where I talk for more than fifteen minutes. Everything after it, you build — on a real ticket from our own backlog.

01

If one of these is false for you, raise a hand now — the rescue corner fixes it while we frame the day.

[ OK ]

You're greenClaude Code installed, logged in, and it answered three sentences about your repo. The screenshot is in the channel.

[ OK ]

You brought a ticketSmall, real, yours, not urgent — plus a backup. That ticket is today's raw material.

[ OK ]

The baseline is inYour survey answers are the before-picture. Week 8 takes the after-picture with the same questions.

[ OK ]

You have a pairAssignments are on the board. Two people, one driver at a time. Find each other now.

WAVE 1

CHATyou copy and paste. The model suggests; you still do all the work.

BEHIND US

WAVE 2

AGENTSthe model acts — it reads files, runs commands, and edits code in a loop.

◀ TODAY, ON OUR OWN CODE

WAVE 3

ROUTINESagents you have trained, running again and again with light checking.

LATER IN THE TRACK

First to call one out loud claims it; we tally at the demo circle. Noticing these moments is the actual skill this track teaches.

CONFIDENTLY WRONGit states something false about our code — fluently, with total confidence.

THE GOOD QUESTIONit asks you something that genuinely sharpens the work.

THE DUMB ZONEa long session starts getting vague, repetitive, or forgetful.

SELF-CORRECTIONit runs something, sees it fail, and fixes itself without you.

THE RUBBER-STAMP URGEyou catch yourself about to approve a diff you didn't really read.

SCOPE CREEPit starts 'improving' things nobody asked for.

BLOCK 2 / 6

First contact — calibrate before you trust.

You wouldn't take a new colleague's word for everything on day one. Same rule here. First contact is read-only: find out what it knows, where it bluffs, and how it works.

02

LAB 1 · HANDS ON · 35 MIN

Calibrate: what does it actually know about our codebase — and where does it bluff?

[1]

Ask what you know colda flow you could teach — grade its answer out loud

[2]

Ask what you don't knowthen open the files it cites and check

[3]

Trace a flow end to endUI → API → database, file by file

Stretch: make it draw the subsystem as a diagram — and grade the diagram too.

LAB TIMER

35:00

T START / PAUSE · R RESET

You're done when

You can name one thing it nailed and one thing it got wrong.

No code was changed.

10 minutes — compare notes with the other pairs.

Block 3 · what steers a model K

No tricks. Give the agent what a new teammate would need on their first day.

Say the goal and the rules — not every step.
Point to the real files, and to examples of how we already do it.
Say what “done” looks like, in checkable terms.
Let it ask. A good agent interviews you before it builds.

a good prompt — anatomy

Block 3 · the discipline S

Don't let it build yet. Make it ask questions until you both mean the same thing.

The agent interviews you: edge cases, assumptions, what must not change.
Disagreements surface now, in planning — not later, in a pull request.
Only then does it produce a plan you can hand over with confidence.

the grill — live

Block 3 · the contract S

A feeling is not a finish line. Done is a list of checks that pass or fail.

3–5 concrete checks per ticket: given this, when that, then this.
The checks become tests. The tests give the agent a target.
“Looks right” is not done. A passing check is done.

done-when.txt

PRODUCT — SANDWICH 1IDEA → REQUIREMENTtheir Comprehend signs off the requirement

|THE SEAM

ENGINEERING — SANDWICH 2REQUIREMENT → CODEyour Specify starts from their output

Every vague spot your pair marked in Lab 2 lives here. The seam is the most expensive place to be vague — and it is where our delays begin.

Back at 13:00. Agents off — hard stop.

BLOCK 4 / 6

The agent types — you supervise.

The skill of this block is knowing when to step in — and when to sit on your hands.

04

Block 4 · supervision, not typing S

Your hands leave the keyboard. Your attention doesn't.

Checks first: the done-when list from your plan becomes failing tests, then code.
Intervene when it asks, when it drifts off-plan, or when a spotter moment fires.
Don't grab the wheel at the first wobble — watch it try to recover first.
Read the tool calls as they happen. Narrate to your pair what it's doing and why.

intervention-policy.conf

LAB 3 · HANDS ON · 60 MIN

Hand over the plan from Lab 2. Your job changes: supervise.

[1]

New branchnamed after your ticket

[2]

Checks firstthe done-when list becomes failing tests, then code until green

[3]

Let it runstep in on questions, drift, or spotter moments — call them

[4]

Watch the loopnarrate the tool calls to your pair

Stretch: kick off a second, smaller task in a separate session — and feel what it does to your supervision of the first.

LAB TIMER

60:00

T START / PAUSE · R RESET

You're done when

The done-when checks pass.

You can explain every changed file in one sentence each.

You met all four this morning without their names. Now they have names — we go deeper on each in later sessions.

CONTEXT WINDOWthe short-term memory that filled up just now. Finite — and quality drops as it fills.

THE AGENTIC LOOPprompt → tool call → result → repeat. What you watched in Lab 1.

MCPshow the agent reaches beyond code: a browser, a ticket system, a database.

REASONING EFFORThow hard it thinks before acting. Tunable. More isn't always better.

Who intervened today — and who let it run too long? Back at 14:30.

BLOCK 5 / 6

Keep your grip on what was built.

The back of the sandwich — and the part that decides whether AI makes us faster or just busier. Code you can't explain isn't done.

05

Block 5 · the anti-rubber-stamp S W

Write down what you expect the diff to contain — before you open it.

Expectation first: your model of the change must exist before the agent's.
Every surprise is either something you misunderstood or something it overdid. Classify each one.
Ask your pair one question about their diff. If they can't answer, that's comprehension debt — found while it's cheap.
AI review augments human review. It never replaces it.

expectation-first

LAB 4 · HANDS ON · 45 MIN

Evidence first. Then read the diff like a stranger wrote it.

[1]

Make it prove the worktests run, app runs — evidence, not claims

[2]

Write your expectationtwo sentences, before opening the diff

[3]

Read the diff against itclassify every surprise

[4]

Cross-review with your pairask one question they must be able to answer

[5]

Fresh agent reviews it toocompare what it caught with what you caught

Stretch: ask the review agent for security findings — injection, authorization, input validation.

LAB TIMER

45:00

T START / PAUSE · R RESET

You're done when

You can defend every file in your diff.

Your pair's question got answered.

⏸ Debrief · the one line that matters

The gap between code that exists and code anyone understands. It compounds quietly — and it is the failure mode this whole track is built to prevent.

Throughput without comprehension makes a team fast and fragile at the same time.
The unit of review is the unit of understanding. Keep changes small.
You just practiced the antidote: expectation-first review, and one honest question.

BLOCK 6 / 6

Today's lessons become tomorrow's factory.

Everything you explained twice today is a lesson the system should never need again. Write it down, prove it works, and the factory gets smarter. Then we demo, measure, and close.

06

Block 6 · the first artifact you own ARTIFACT

One file the agent reads every session. Today's friction becomes tomorrow's head start.

Rules, traps, where things live — written once, read every session.
It's the team's shared memory, kept in version control.
A team that writes lessons down owns a factory that improves weekly. A team that doesn't starts from zero every morning.

~/platform/CLAUDE.md

LAB 5 · HANDS ON · 30 MIN

Make tomorrow's agent smarter than today's. One rule, written and proven.

[1]

Find your frictionwhat did you explain to the agent twice today?

[2]

Write the rule in CLAUDE.mdshort, imperative, specific — one to three rules

[3]

Prove itfresh session: does it behave differently now?

[4]

Post your best rulethe keepers seed the team's shared config

Stretch: turn a workflow you repeated today — like the grill prompt — into a reusable skill stub.

LAB TIMER

30:00

T START / PAUSE · R RESET

You're done when

A fresh session behaves differently because of something you wrote.

EVERYONE · DEMO CIRCLE · 3 MIN PER PAIR

A comprehension check, not a victory lap.

[1]

The diffwhat changed, in 30 seconds

[2]

One thing it nailedthat would have taken you longer by hand

[3]

One place it went wrongand how you caught it — this is the part we grade

“It went perfectly” earns follow-up questions. Caught failures earn applause.

PER PAIR

03:00

T START / PAUSE · R RESET — hard-timed

We close when

Every pair has shown a catch, not just a ship.

The spotter's card is tallied.

If we only count speed, we end up approving work nobody read. So we count both — agreed now, before story points become the only number.

metrics.conf — the deal we make

THRUPUThow much work ships

COMPRHNwhether we still understand what shipped

Baseline taken today. Same questions at week 8 — results shown, not claimed.