MEETING SELECT // DAY ONE OF THE TRACK

AIPOWERHOUSE

HACKATHON KICKOFF

One day. One real ticket from our own backlog. You run the whole lifecycle — SPECIFY → GENERATE → COMPREHEND — with an agent doing the typing.

RULE OF THE DAYShort talks, long labs. If I speak for more than fifteen minutes, something has gone wrong.

→ to begin  ·  T starts lab timers

Morning
09:30TALKFrame — the sandwich, the day, the spotter's card
09:55TALKDrive — the Claude Code driving lesson
10:05LABLab 1 · First contact — calibrate on our own codebase
10:40TALKDebrief — the agentic loop · coffee
10:55TALKSpecify — prompting, grilling, checkable done
11:10LABLab 2 · Plan & grill — your ticket becomes a plan
11:55TALKDebrief — the seam
12:10BREAKLunch
Afternoon
13:00TALKGenerate — let it run, when to step in
13:10LABLab 3 · Build it — the agent types, you supervise
14:10BREAKDebrief + coffee — who intervened, and why
14:30TALKComprehend — expectation-first review
14:40LABLab 4 · Prove it, then review it — evidence, diff, cross-review
15:30LABLab 5 · Teach the factory — CLAUDE.md, today's lessons kept
16:00LABDemo circle — show the catch, not just the ship
16:40TALKClose — the factory · the road ahead

Timeboxes are hard; done-when beats done-everything. Whatever state your lab is in when time runs out, that's what we debrief.

The model K

Every piece of work is a sandwich. You are the bread on both ends; the AI is the filling — the only loop you follow today.

  • Specify — you say what you want and what the rules are. A clear spec means less guessing.
  • Generate — the agent does the work. It writes code, tests, and edits.
  • Comprehend — you read it back, ask questions, and decide. This is where quality is set.
  • Skip Comprehend and you did not save time. You just pushed the work to later.

Metaphor: Dan Shipper, Every's podcast “AI & I”, with Kieran Klaassen. The Specify→Generate→Comprehend framing is this engagement's own adaptation.

YOU · BREADSPECIFYsay what you want, and the rules
THE AGENT · FILLINGGENERATEcode, tests, edits — the typing
YOU · BREADCOMPREHENDread it back, question it, decide
What changes for you

REVIEW 15%
WRITING CODE 70%
SPECIFY 15%
Before
COMPREHEND 40%
CODE 20%
SPECIFY 40%
Toward

You write less and less code over time. The time does not vanish — it moves to the two human ends.

!
Comprehension debt
Code that ships, but nobody understands. The cost hits the first time it breaks.
!
Orchestration ceiling
One reviewer behind many agents. Past your limit, quality quietly drops.
BLOCK 3 / 6

Agree the what before anything builds.

The front of the sandwich. Vague in, vague out — judgment goes in here, while changing your mind is still cheap. Unclear requirements are this company's single biggest source of delay; this block is the antidote.

03
Block 2 · Drive — the driving lesson S

Your tool for the day: a session, your repo, and control over what it may do.

  • Start it in the repo root. Now it can see your real code.
  • Permissions: it asks before acting, until you choose to let it run.
  • Talk to it in plain language. Point at real files. No magic words.
  • Esc interrupts at any time. You are always the one in charge.
you@meetingselect : ~/platform
LAB 2 · HANDS ON · 45 MIN

Your ticket becomes a plan so clear someone else could build it.

[1]
Make it interview youpaste the grill prompt; answer at least five rounds
[2]
Get the plan + done-when list3–5 checks that can pass or fail
[3]
Swap plans with your pairmark every spot where you'd have to guess
[4]
Fix the vague spotsuntil your pair signs off

Stretch: two competing approaches from the agent; one sentence on why you chose yours.

LAB TIMER
45:00
T START / PAUSE  ·  R RESET
You're done when
Your pair says: “I could build this without asking you anything.”
⛔ HOLD POINT — nobody builds before lunch. That urge you feel right now is what today is about.
⏸ Debrief · what you just watched

It read, it ran, it looked at the result, and it went again. That loop is the difference between an agent and a chatbot.

  • Each step is a tool call. The result decides the next step.
  • It acts — it doesn't just answer. That's what makes it wave 2.
  • The loop stops when the goal is met, or when it needs you.
PROMPT TOOL CALL RESULT REPEAT ⟳
agent — live loop
agent — session 02:41 · long run, many files
CONTEXT 34%
⏸ Intermezzo · taught when it happens

Long session, many files — and the answers went vague. The context window is nearly full. This is normal; now you know its name.

  • The agent's short-term memory has a hard limit. Old details fall off or blur.
  • The fix: compact the session, start fresh, or write the state to a file first.
  • Long runs are managed, not endured. This is half of what supervision means.
D replays the demo
The model, at scale

Many agents can work at the same time. But only one person reviews. That is the slow part.

  • Agents spread out. Many tasks generate at the same time.
  • Human review is the one slow step. It sets how fast the factory really goes.
Run more agents than you can read and you don't go faster — you just approve without checking.
htop — the factory · one human on shift
AGENTS0 / 12 running
REVIEW1 human · 100% · SATURATED
PID USER CPU% ST COMMAND

Today was the whole sandwich, fast. The next sessions slow it down — each one takes a station you ran today and builds it properly, with homework on real tickets in between.

S1/Planning & refiningThe front of the sandwich, done right: codebase archaeology, codifying tacit knowledge, the grill, the seam.
S2/Building the machineParallel agents, autonomous routines, self-reviewing pipelines, guardrails — the factory itself.
S3/Reviewing codeKeeping your grip as volume rises: architectural review, agents in the browser, blocking the rubber stamp.
S4/Orchestration & judgmentThe capstone: your parallel ceiling, backpressure, and the judgment no agent replaces.

Everyone drove an agent through Specify → Generate → Comprehend on our own code — and the lessons you wrote into CLAUDE.md are still here tomorrow. That loop, repeated, is the factory.

The harder parts — many agents at once, autonomous routines, real review at volume — come in the next sessions. We are not learning a tool. We are building a factory the team owns. Today was day one.


      
BLOCK 1 / 6

25 minutes of theory — the only long talk of the day.

Six blocks. This is the only one where I talk for more than fifteen minutes. Everything after it, you build — on a real ticket from our own backlog.

01

If one of these is false for you, raise a hand now — the rescue corner fixes it while we frame the day.

[ OK ]
You're greenClaude Code installed, logged in, and it answered three sentences about your repo. The screenshot is in the channel.
[ OK ]
You brought a ticketSmall, real, yours, not urgent — plus a backup. That ticket is today's raw material.
[ OK ]
The baseline is inYour survey answers are the before-picture. Week 8 takes the after-picture with the same questions.
[ OK ]
You have a pairAssignments are on the board. Two people, one driver at a time. Find each other now.

WAVE 1
CHATyou copy and paste. The model suggests; you still do all the work.
BEHIND US
WAVE 2
AGENTSthe model acts — it reads files, runs commands, and edits code in a loop.
◀ TODAY, ON OUR OWN CODE
WAVE 3
ROUTINESagents you have trained, running again and again with light checking.
LATER IN THE TRACK

First to call one out loud claims it; we tally at the demo circle. Noticing these moments is the actual skill this track teaches.

CONFIDENTLY WRONGit states something false about our code — fluently, with total confidence.
THE GOOD QUESTIONit asks you something that genuinely sharpens the work.
THE DUMB ZONEa long session starts getting vague, repetitive, or forgetful.
SELF-CORRECTIONit runs something, sees it fail, and fixes itself without you.
THE RUBBER-STAMP URGEyou catch yourself about to approve a diff you didn't really read.
SCOPE CREEPit starts 'improving' things nobody asked for.
BLOCK 2 / 6

First contact — calibrate before you trust.

You wouldn't take a new colleague's word for everything on day one. Same rule here. First contact is read-only: find out what it knows, where it bluffs, and how it works.

02
LAB 1 · HANDS ON · 35 MIN

Calibrate: what does it actually know about our codebase — and where does it bluff?

[1]
Ask what you know colda flow you could teach — grade its answer out loud
[2]
Ask what you don't knowthen open the files it cites and check
[3]
Trace a flow end to endUI → API → database, file by file

Stretch: make it draw the subsystem as a diagram — and grade the diagram too.

LAB TIMER
35:00
T START / PAUSE  ·  R RESET
You're done when
You can name one thing it nailed and one thing it got wrong.
No code was changed.

10 minutes — compare notes with the other pairs.
Block 3 · what steers a model K

No tricks. Give the agent what a new teammate would need on their first day.

  • Say the goal and the rules — not every step.
  • Point to the real files, and to examples of how we already do it.
  • Say what “done” looks like, in checkable terms.
  • Let it ask. A good agent interviews you before it builds.
a good prompt — anatomy
Block 3 · the discipline S

Don't let it build yet. Make it ask questions until you both mean the same thing.

  • The agent interviews you: edge cases, assumptions, what must not change.
  • Disagreements surface now, in planning — not later, in a pull request.
  • Only then does it produce a plan you can hand over with confidence.
the grill — live
Block 3 · the contract S

A feeling is not a finish line. Done is a list of checks that pass or fail.

  • 3–5 concrete checks per ticket: given this, when that, then this.
  • The checks become tests. The tests give the agent a target.
  • “Looks right” is not done. A passing check is done.
done-when.txt

PRODUCT — SANDWICH 1IDEA → REQUIREMENTtheir Comprehend signs off the requirement
|THE SEAM
ENGINEERING — SANDWICH 2REQUIREMENT → CODEyour Specify starts from their output

Every vague spot your pair marked in Lab 2 lives here. The seam is the most expensive place to be vague — and it is where our delays begin.

Back at 13:00. Agents off — hard stop.
BLOCK 4 / 6

The agent types — you supervise.

The skill of this block is knowing when to step in — and when to sit on your hands.

04
Block 4 · supervision, not typing S

Your hands leave the keyboard. Your attention doesn't.

  • Checks first: the done-when list from your plan becomes failing tests, then code.
  • Intervene when it asks, when it drifts off-plan, or when a spotter moment fires.
  • Don't grab the wheel at the first wobble — watch it try to recover first.
  • Read the tool calls as they happen. Narrate to your pair what it's doing and why.
intervention-policy.conf
LAB 3 · HANDS ON · 60 MIN

Hand over the plan from Lab 2. Your job changes: supervise.

[1]
New branchnamed after your ticket
[2]
Checks firstthe done-when list becomes failing tests, then code until green
[3]
Let it runstep in on questions, drift, or spotter moments — call them
[4]
Watch the loopnarrate the tool calls to your pair

Stretch: kick off a second, smaller task in a separate session — and feel what it does to your supervision of the first.

LAB TIMER
60:00
T START / PAUSE  ·  R RESET
You're done when
The done-when checks pass.
You can explain every changed file in one sentence each.

You met all four this morning without their names. Now they have names — we go deeper on each in later sessions.

CONTEXT WINDOWthe short-term memory that filled up just now. Finite — and quality drops as it fills.
THE AGENTIC LOOPprompt → tool call → result → repeat. What you watched in Lab 1.
MCPshow the agent reaches beyond code: a browser, a ticket system, a database.
REASONING EFFORThow hard it thinks before acting. Tunable. More isn't always better.

Who intervened today — and who let it run too long? Back at 14:30.
BLOCK 5 / 6

Keep your grip on what was built.

The back of the sandwich — and the part that decides whether AI makes us faster or just busier. Code you can't explain isn't done.

05
Block 5 · the anti-rubber-stamp S W

Write down what you expect the diff to contain — before you open it.

  • Expectation first: your model of the change must exist before the agent's.
  • Every surprise is either something you misunderstood or something it overdid. Classify each one.
  • Ask your pair one question about their diff. If they can't answer, that's comprehension debt — found while it's cheap.
  • AI review augments human review. It never replaces it.
expectation-first
LAB 4 · HANDS ON · 45 MIN

Evidence first. Then read the diff like a stranger wrote it.

[1]
Make it prove the worktests run, app runs — evidence, not claims
[2]
Write your expectationtwo sentences, before opening the diff
[3]
Read the diff against itclassify every surprise
[4]
Cross-review with your pairask one question they must be able to answer
[5]
Fresh agent reviews it toocompare what it caught with what you caught

Stretch: ask the review agent for security findings — injection, authorization, input validation.

LAB TIMER
45:00
T START / PAUSE  ·  R RESET
You're done when
You can defend every file in your diff.
Your pair's question got answered.
⏸ Debrief · the one line that matters

The gap between code that exists and code anyone understands. It compounds quietly — and it is the failure mode this whole track is built to prevent.

  • Throughput without comprehension makes a team fast and fragile at the same time.
  • The unit of review is the unit of understanding. Keep changes small.
  • You just practiced the antidote: expectation-first review, and one honest question.
BLOCK 6 / 6

Today's lessons become tomorrow's factory.

Everything you explained twice today is a lesson the system should never need again. Write it down, prove it works, and the factory gets smarter. Then we demo, measure, and close.

06
Block 6 · the first artifact you own ARTIFACT

One file the agent reads every session. Today's friction becomes tomorrow's head start.

  • Rules, traps, where things live — written once, read every session.
  • It's the team's shared memory, kept in version control.
  • A team that writes lessons down owns a factory that improves weekly. A team that doesn't starts from zero every morning.
~/platform/CLAUDE.md
LAB 5 · HANDS ON · 30 MIN

Make tomorrow's agent smarter than today's. One rule, written and proven.

[1]
Find your frictionwhat did you explain to the agent twice today?
[2]
Write the rule in CLAUDE.mdshort, imperative, specific — one to three rules
[3]
Prove itfresh session: does it behave differently now?
[4]
Post your best rulethe keepers seed the team's shared config

Stretch: turn a workflow you repeated today — like the grill prompt — into a reusable skill stub.

LAB TIMER
30:00
T START / PAUSE  ·  R RESET
You're done when
A fresh session behaves differently because of something you wrote.
EVERYONE · DEMO CIRCLE · 3 MIN PER PAIR

A comprehension check, not a victory lap.

[1]
The diffwhat changed, in 30 seconds
[2]
One thing it nailedthat would have taken you longer by hand
[3]
One place it went wrongand how you caught it — this is the part we grade

“It went perfectly” earns follow-up questions. Caught failures earn applause.

PER PAIR
03:00
T START / PAUSE  ·  R RESET — hard-timed
We close when
Every pair has shown a catch, not just a ship.
The spotter's card is tallied.

If we only count speed, we end up approving work nobody read. So we count both — agreed now, before story points become the only number.

metrics.conf — the deal we make
THRUPUThow much work ships
COMPRHNwhether we still understand what shipped

Baseline taken today. Same questions at week 8 — results shown, not claimed.

MEETING SELECT ▸ AI POWERHOUSE
01/12
⏱ TIME
Hands off the keyboard. Whatever state you're in — that's what we debrief.
any key to dismiss