Building a fintech reconciliation engine that handles ten million transactions a day

The FinanceOps reconciliation engine is the most critical system we operate. It matches financial records across banks, payment processors, and our internal ledger to verify that every penny is accounted for. At our current scale, that means processing roughly ten million transactions per day with sub-minute reconciliation windows for each batch.

Building a system at this scale with a small team required deliberate architectural choices at every layer. Here is how we did it.

Generate a realistic dark editorial illustration for a fintech engineering article: layered ledger rows, payment states, query bottleneck cues, subtle SQL-console influence, deep navy and amber palette, 16:9, no text, no clip art, no fake corporate stock feel.

Where the data model or query started fighting back.

By this point I cared less about sounding smart and more about making the tradeoff legible. It also builds on what I learned earlier in “S3 Vectors at re:Invent made me reconsider our entire RAG architecture.” The systems had enough history that every database or eventing opinion had receipts behind it. That is the same posture I now bring to longer-lived experiments like ftryos and pipeline-sdk: if the constraint is real, say it plainly and design around it.

Editorial supporting image for the section "PostgreSQL Partitioned Tables" in the article "Building a fintech reconciliation engine that handles ten million transactions a day". Show open laptop with SQL query plan or ledger-like table visible, printed reconciliation notes, payment terminal receipt, notebook full of arrows and constraints for "Building a fintech reconciliation engine that handles ten million transactions a day". Focus on one operational artifact that makes the post feel lived-in rather than conceptual. Color palette: deep blues, oxidized steel, warm amber monitor spill. Mood: measured, confident, strategic, scarred enough to sound calm while saying hard things. Composition: 16:9 landscape image, documentary/editorial feel, no text overlays, no stock-photo polish. Avoid: No floating database icons, no cartoon locks, no holograms, no generic fintech stock imagery, no text overlays.

The operational artifact behind the argument.

PostgreSQL Partitioned Tables

Transaction data grows fast. Ten million records per day means roughly 300 million per month. Querying a single table with billions of rows is not viable at the latency requirements we need.

We partition the transactions table by date using PostgreSQL declarative partitioning. Each day gets its own partition. The reconciliation engine primarily queries the current day and the previous day, so the working set stays small relative to the total data volume.

Daily partitions keep the working set manageable. Index scans on a 10-million-row partition are fast. Index scans on a 3-billion-row table are not.
Partition pruning eliminates irrelevant data automatically. A query filtered by date only touches the relevant partitions.
Old partitions can be archived to cold storage without affecting active query performance. We archive partitions older than 90 days to S3.
VACUUM operations complete quickly on small partitions. A full VACUUM on a billion-row table can take hours and impact production queries.

The Matching Algorithm

Reconciliation is fundamentally a matching problem: given a transaction record from source A and a record from source B, determine whether they represent the same real-world event.

This is harder than it sounds because different sources represent the same transaction differently. The bank sends a settlement record with their reference number. The payment processor sends a webhook with their transaction ID. Our internal ledger has its own primary key. Amounts may differ slightly due to fees applied at different stages. Timestamps differ because systems process events at different times.

Our matching algorithm uses a tiered approach:

Tier 1: Exact match on idempotency key. When all sources use a shared key, matching is trivial. This covers about 70% of transactions.
Tier 2: Composite match on amount, currency, and time window. For sources without shared keys, we match on the combination of transaction amount, currency, and a configurable time window. This covers another 25%.
Tier 3: Fuzzy match with manual review. The remaining 5% cannot be automatically matched due to fee discrepancies, timing issues, or data quality problems. These get flagged for human review with the candidate matches ranked by confidence score.

Node.js Worker Pools

The reconciliation engine runs as a pool of Node.js workers processing batches in parallel. Each worker pulls a batch of unreconciled transactions, runs the matching algorithm, and writes the results back.

Batch size is tuned to 5,000 transactions per batch. Large enough for efficient database queries, small enough for predictable memory usage.
Worker count is set to the number of available CPU cores minus one. The remaining core handles the orchestrator process that assigns batches and monitors progress.
Each worker is stateless. If a worker crashes, the batch it was processing gets reassigned automatically. No transaction is lost because the batch assignment uses a PostgreSQL advisory lock.
Idempotent processing means any batch can be safely reprocessed. Workers write results with ON CONFLICT clauses that update rather than duplicate.

Sub-Minute Reconciliation Windows

The target is to reconcile each batch within 60 seconds of all source records arriving. Achieving this required eliminating every unnecessary step from the processing pipeline.

Records are indexed on ingestion, not during reconciliation. By the time the matching algorithm runs, all relevant records are already indexed and queryable.
The matching query uses a materialized view that pre-joins transaction records with their source metadata. The view refreshes every 30 seconds.
Results are written in bulk using PostgreSQL COPY instead of individual INSERT statements. The throughput difference is roughly 10x for our batch sizes.
Monitoring is real-time. A Grafana dashboard shows reconciliation lag by source, match rate by tier, and processing throughput per worker.

What We Got Wrong Initially

The first version of the reconciliation engine tried to process transactions individually instead of in batches. It worked at 100,000 transactions per day. At 1 million, it fell over. The transition to batch processing was the single most impactful architectural change we made.

The second mistake was trying to reconcile in real time instead of in windows. Real-time reconciliation sounds ideal, but it means processing every transaction the moment it arrives, before all sources have reported. This creates false mismatches that generate noise for the operations team. Switching to reconciliation windows, where we wait until all sources are expected to have reported before matching, reduced false mismatch rates by 90%.

Create a realistic systems-style editorial image: simplified financial workflow, tidy relational structure, subtle success signal, graphite and ledger-green palette, 4:3, no text labels, no infographic clip art.

The system after the boring-but-correct fix.

This is the phase where individual scars finally turned into repeatable operating principles. I cared less about sounding clever and more about leaving behind a system that stayed sane without me in the room. That is how I build ftryos and pipeline-sdk too.

Reconciliation at scale is a batch processing problem, not a real-time processing problem. The instinct to process immediately is wrong. The discipline to wait for complete data and process in bulk is what makes it work.

The system processes ten million transactions daily. The infrastructure cost is roughly $800 per month in database and compute. The reconciliation accuracy rate is 99.7% before human review and 100% after. It is the system I am most proud of at FinanceOps, not because it is clever, but because it is simple and it works.