Why I stopped asking candidates to whiteboard and started asking them to review pull requests

For years I ran the standard interview loop. Phone screen, take-home project, whiteboard coding session, system design round. The whiteboard round was the one I trusted most. If a candidate could implement a balanced binary search tree under pressure while explaining their thought process, I figured they could handle production engineering.

I was wrong. The correlation between whiteboard performance and job performance was effectively zero. Some of our best engineers froze at the whiteboard. Some of our worst hires breezed through it. The signal was noise.

Generate a realistic hiring editorial scene with pull request diff, handwritten notes, laptop on a desk, soft overhead lighting, 16:9, no staged interview smiles, no stock-photo office feel.

Evidence beats theater.

Most of the leadership posts in this phase are me compressing scar tissue into defaults. It also builds on what I learned earlier in “Two years in and I finally stopped rewriting the org chart every quarter.” I am less interested in performative management language now and more interested in the boring mechanisms that keep teams aligned when nobody is in a heroic mood.

Editorial supporting image for the section "What the Job Actually Requires" in the article "Why I stopped asking candidates to whiteboard and started asking them to review pull requests". Show conference-room table or desk with planning notes, incident timeline, dashboard printouts, half-drunk coffee, and a worn notebook for "Why I stopped asking candidates to whiteboard and started asking them to review pull requests". Focus on one operational artifact that makes the post feel lived-in rather than conceptual. Color palette: slate, amber, off-white paper, low-contrast office shadows. Mood: measured, confident, strategic, scarred enough to sound calm while saying hard things. Composition: 16:9 landscape image, documentary/editorial feel, no text overlays, no stock-photo polish. Avoid: No staged handshake photos, no smiling meeting stock imagery, no motivational poster energy, no text overlays.

The meeting-room version of the technical scar.

What the Job Actually Requires

I started by listing the skills that matter most for a senior engineer at FinanceOps. Not the skills I wish mattered. The skills that actually predicted success or failure in the first six months.

Reading unfamiliar code quickly and identifying the author’s intent
Spotting subtle bugs, race conditions, and edge cases in existing code
Communicating technical tradeoffs clearly and concisely
Evaluating whether a proposed approach is appropriate for the constraints
Giving feedback that is direct, specific, and constructive

Not one of these skills is measured by whiteboard coding. Every one of them is measured by reviewing a pull request.

The PR Review Interview

We now give candidates a real pull request from our codebase. Not a toy example. A real PR with real context, real trade-offs, and real imperfections. We anonymize it, strip customer data, and provide enough context for a reviewer who is new to the codebase.

The PR is typically 200-400 lines. It includes:

A feature implementation with a subtle bug that an experienced engineer should catch
A design decision that is defensible but has a clear alternative worth discussing
Error handling that works but misses an edge case
Test coverage that is adequate but not comprehensive
A migration or schema change with performance implications

The candidate gets 45 minutes to review the PR and write their comments, then 30 minutes to discuss their review with two interviewers. We are not looking for a perfect review. We are looking for how they think, what they prioritize, and how they communicate disagreement.

What We Actually Evaluate

After running this format for six months and comparing it against the first-quarter performance of hires, we identified the signals that actually predict success:

Do they read the PR description and understand the business context before diving into code?
Do they distinguish between blocking issues and style preferences?
Do they explain why something is a problem, not just that it is a problem?
Do they suggest alternatives when they flag an issue, or just criticize?
Do they catch the subtle bug? Not everyone does, and that is fine. But how they react when we point it out tells us a lot.
Do they ask clarifying questions about the system context they lack?

The strongest signal is how they handle disagreement during the discussion. When an interviewer pushes back on their review comment, do they defend their position with evidence? Do they update their view when presented with new context? Do they get defensive? This 30-minute discussion tells us more about working with this person than any coding exercise ever could.

The Results

We have now hired eight engineers using this format. Compared to our previous whiteboard-based loop, the outcomes have been noticeably better. Two specific improvements stand out.

First, we stopped accidentally filtering out experienced engineers who do not perform well under artificial coding pressure. Two of our best hires told us directly that they would have struggled in a whiteboard interview and almost did not apply. The PR review format let them demonstrate the skills they actually use every day.

Second, the interview itself became a more honest preview of the job. Candidates leave the interview knowing what code review at FinanceOps actually looks like. They self-select more accurately. Two candidates withdrew after the interview, telling us the codebase was not the kind of work they wanted to do. That is a feature, not a failure.

Create an editorial desk scene about engineering hiring judgment: scorecards, code review notes, subtle authority, 1:1, natural shadows, no HR stock-photo look.

Good hiring usually looks quieter than people expect.

By the time I wrote this, the lesson was bigger than the tool or incident. The job had become setting defaults a team could trust, then proving those defaults in systems like lifeos and bisen-apps. That is leadership work, not just technical taste.

The best hiring signal is how a candidate does the actual job. At most software companies, the actual job is reading and reviewing code, not writing algorithms on a whiteboard.

I still do a system design round for senior and staff roles. Architecture thinking matters and is hard to evaluate in a PR review. But the whiteboard coding round is gone. Permanently. The PR review is a better filter, a better candidate experience, and a better predictor of on-the-job performance. The evidence is clear enough that I will not go back.

The PR review interview format revealed qualities that whiteboard interviews structurally cannot: how candidates communicate tradeoffs in writing, how they handle ambiguity in real code, and how they respond to feedback on their own work. Every strong hire we made in the following year came through this process. The candidates who excelled were not always the fastest coders. They were the ones who asked clarifying questions, explained their reasoning, and adapted when the review surfaced a better approach. Those are exactly the qualities that matter on a small team where every engineer owns critical systems.