Why I stopped asking candidates to whiteboard and started asking them to review pull requests
Whiteboard coding measures performance under artificial stress. PR reviews measure what the job actually requires.
For years I ran the standard interview loop. Phone screen, take-home project, whiteboard coding session, system design round. The whiteboard round was the one I trusted most. If a candidate could implement a balanced binary search tree under pressure while explaining their thought process, I figured they could handle production engineering.
I was wrong. The correlation between whiteboard performance and job performance was effectively zero. Some of our best engineers froze at the whiteboard. Some of our worst hires breezed through it. The signal was noise.
Most of the leadership posts in this phase are me compressing scar tissue into defaults. It also builds on what I learned earlier in “Two years in and I finally stopped rewriting the org chart every quarter.” I am less interested in performative management language now and more interested in the boring mechanisms that keep teams aligned when nobody is in a heroic mood.
What the Job Actually Requires
I started by listing the skills that matter most for a senior engineer at FinanceOps. Not the skills I wish mattered. The skills that actually predicted success or failure in the first six months.
- Reading unfamiliar code quickly and identifying the author’s intent
- Spotting subtle bugs, race conditions, and edge cases in existing code
- Communicating technical tradeoffs clearly and concisely
- Evaluating whether a proposed approach is appropriate for the constraints
- Giving feedback that is direct, specific, and constructive
Not one of these skills is measured by whiteboard coding. Every one of them is measured by reviewing a pull request.
The PR Review Interview
We now give candidates a real pull request from our codebase. Not a toy example. A real PR with real context, real trade-offs, and real imperfections. We anonymize it, strip customer data, and provide enough context for a reviewer who is new to the codebase.
The PR is typically 200-400 lines. It includes:
- A feature implementation with a subtle bug that an experienced engineer should catch
- A design decision that is defensible but has a clear alternative worth discussing
- Error handling that works but misses an edge case
- Test coverage that is adequate but not comprehensive
- A migration or schema change with performance implications
The candidate gets 45 minutes to review the PR and write their comments, then 30 minutes to discuss their review with two interviewers. We are not looking for a perfect review. We are looking for how they think, what they prioritize, and how they communicate disagreement.
What We Actually Evaluate
After running this format for six months and comparing it against the first-quarter performance of hires, we identified the signals that actually predict success:
- Do they read the PR description and understand the business context before diving into code?
- Do they distinguish between blocking issues and style preferences?
- Do they explain why something is a problem, not just that it is a problem?
- Do they suggest alternatives when they flag an issue, or just criticize?
- Do they catch the subtle bug? Not everyone does, and that is fine. But how they react when we point it out tells us a lot.
- Do they ask clarifying questions about the system context they lack?
The strongest signal is how they handle disagreement during the discussion. When an interviewer pushes back on their review comment, do they defend their position with evidence? Do they update their view when presented with new context? Do they get defensive? This 30-minute discussion tells us more about working with this person than any coding exercise ever could.
The Results
We have now hired eight engineers using this format. Compared to our previous whiteboard-based loop, the outcomes have been noticeably better. Two specific improvements stand out.
First, we stopped accidentally filtering out experienced engineers who do not perform well under artificial coding pressure. Two of our best hires told us directly that they would have struggled in a whiteboard interview and almost did not apply. The PR review format let them demonstrate the skills they actually use every day.
Second, the interview itself became a more honest preview of the job. Candidates leave the interview knowing what code review at FinanceOps actually looks like. They self-select more accurately. Two candidates withdrew after the interview, telling us the codebase was not the kind of work they wanted to do. That is a feature, not a failure.
By the time I wrote this, the lesson was bigger than the tool or incident. The job had become setting defaults a team could trust, then proving those defaults in systems like lifeos and bisen-apps. That is leadership work, not just technical taste.
The best hiring signal is how a candidate does the actual job. At most software companies, the actual job is reading and reviewing code, not writing algorithms on a whiteboard.
I still do a system design round for senior and staff roles. Architecture thinking matters and is hard to evaluate in a PR review. But the whiteboard coding round is gone. Permanently. The PR review is a better filter, a better candidate experience, and a better predictor of on-the-job performance. The evidence is clear enough that I will not go back.
The PR review interview format revealed qualities that whiteboard interviews structurally cannot: how candidates communicate tradeoffs in writing, how they handle ambiguity in real code, and how they respond to feedback on their own work. Every strong hire we made in the following year came through this process. The candidates who excelled were not always the fastest coders. They were the ones who asked clarifying questions, explained their reasoning, and adapted when the review surfaced a better approach. Those are exactly the qualities that matter on a small team where every engineer owns critical systems.