portfolio Anshul Bisen
ask my work

PostgreSQL 17 just dropped and COPY performance alone makes the upgrade worth it

PostgreSQL 17 shipped with 2x COPY performance, JSON_TABLE, and MERGE with RETURNING. The COPY improvement cut our nightly import from 12 minutes to under 6.

PostgreSQL 17 shipped on September 26th and I upgraded our staging environment the same day. Not because I enjoy living on the edge, but because the release notes mentioned a 2x improvement in COPY performance and our nightly bank statement import is a COPY FROM operation that processes 800,000 rows. If the benchmark held up in our environment, that alone justified the upgrade.

I spent a week benchmarking the three features that matter most for our fintech workload: COPY throughput, JSON_TABLE for webhook payload processing, and MERGE with RETURNING for our reconciliation upsert pattern. Two of the three exceeded my expectations. The third came with a surprise.

Where the data model or query started fighting back.

In that first stretch at FinanceOps, I was still learning how to wear the Head of Engineering title without hiding behind it. It also builds on what I learned earlier in “Why every API endpoint at FinanceOps returns the same error shape.” The only credibility that mattered was whether the decision survived contact with real money, ugly edge cases, and the next person I would eventually hire. That same bias toward strict boundaries later shaped how I approached ftryos and pipeline-sdk: make correctness boring before you make the API clever.

The operational artifact behind the argument.

COPY Performance: The Headline Is Real

Our nightly import pulls bank statement data from a partner API and loads it into PostgreSQL via COPY FROM. The data arrives as CSV with 800,000 rows and 12 columns. On PostgreSQL 16, this operation took 12 minutes and 18 seconds. On PostgreSQL 17 with identical hardware, data, and configuration: 5 minutes and 42 seconds.

PostgreSQL 16.3:
COPY 800000 rows — 12m 18s
Throughput: 1,083 rows/sec
PostgreSQL 17.0:
COPY 800000 rows — 5m 42s
Throughput: 2,339 rows/sec
Improvement: 2.16x faster

The improvement comes from internal changes to how COPY handles WAL (write-ahead log) writes. PostgreSQL 17 batches WAL records more aggressively during COPY operations, reducing the number of I/O system calls. For our workload, this translated to a 2.16x throughput improvement with zero configuration changes.

This matters for our operations because the nightly import runs during a maintenance window. A 12-minute import that blocks other maintenance tasks is annoying. A 6-minute import leaves room for VACUUM, ANALYZE, and backup operations in the same window.

JSON_TABLE: Finally a Clean Way to Shred Webhooks

Payment processor webhooks arrive as nested JSON payloads that we need to decompose into relational rows. Before PostgreSQL 17, this required a chain of jsonb_extract_path, jsonb_array_elements, and lateral joins that was ugly and hard to optimize.

-- BEFORE PostgreSQL 17: lateral join + jsonb decomposition
SELECT
w.id,
item->>'amount' AS amount,
item->>'currency' AS currency,
item->>'status' AS status
FROM webhooks w,
jsonb_array_elements(w.payload->'line_items') AS item
WHERE w.processed = false;
-- AFTER PostgreSQL 17: JSON_TABLE
SELECT w.id, jt.*
FROM webhooks w,
JSON_TABLE(
w.payload, '$.line_items[*]'
COLUMNS (
amount NUMERIC PATH '$.amount',
currency TEXT PATH '$.currency',
status TEXT PATH '$.status'
)
) AS jt
WHERE w.processed = false;

JSON_TABLE is not just syntactic sugar. It gives the planner better information about the expected output shape, which can lead to better join strategies. In our benchmark, the JSON_TABLE version ran about 15% faster than the lateral join equivalent on a dataset of 50,000 webhook payloads with an average of 3 line items each.

More importantly, the JSON_TABLE syntax is dramatically more readable. The COLUMNS clause makes it obvious what fields are being extracted and what types they map to. New team members can understand the query without tracing through nested jsonb function calls.

MERGE with RETURNING: The Reconciliation Win

Our reconciliation engine uses an upsert pattern: if a bank statement row matches an existing record, update it. If not, insert it. PostgreSQL 15 introduced MERGE but without RETURNING, which meant we could not get back the affected rows in a single statement. We had to run MERGE and then a separate SELECT to find what changed.

MERGE INTO bank_statements AS target
USING staging_import AS source
ON target.reference_id = source.reference_id
WHEN MATCHED AND target.amount_cents != source.amount_cents THEN
UPDATE SET
amount_cents = source.amount_cents,
updated_at = NOW()
WHEN NOT MATCHED THEN
INSERT (reference_id, amount_cents, statement_date)
VALUES (source.reference_id, source.amount_cents, source.statement_date)
RETURNING
target.id,
target.reference_id,
CASE WHEN xmax = 0 THEN 'inserted' ELSE 'updated' END AS action;

MERGE with RETURNING eliminates the follow-up SELECT. The RETURNING clause tells us exactly which rows were inserted and which were updated, along with the final row values. This cut our reconciliation batch processing time by about 20% because we eliminated 800,000 individual lookups that were previously needed to determine what the MERGE did.

The VACUUM Behavior Change That Bit Us

The upgrade was not entirely smooth. PostgreSQL 17 changed the default behavior of the autovacuum launcher to be more aggressive about processing tables with significant bloat. This sounds like a good thing, and it is, but it caught us off guard in staging when autovacuum kicked in during peak hours and competed with our application queries for I/O bandwidth.

  • Symptom: P99 query latency doubled during a 20-minute window every afternoon.
  • Cause: Autovacuum processing our largest table (payments, 2 million rows) with the new more-aggressive thresholds.
  • Fix: Tuned autovacuum_vacuum_cost_delay from the new default to a slightly higher value for the payments table. This slowed down vacuuming but eliminated the I/O contention during peak hours.

This is the kind of upgrade surprise that does not show up in benchmarks. The VACUUM behavior change is a net positive for most workloads but if you have specific I/O budgets or latency SLAs, test the autovacuum behavior before promoting to production.

The system after the boring-but-correct fix.

That was the pattern of my first months at FinanceOps: I did not have management scar tissue yet, so I earned trust by making technical decisions that stayed boring under pressure. The same bias toward strict defaults still shows up in ftryos and pipeline-sdk today.

PostgreSQL 17 is a strong upgrade for fintech workloads. The COPY performance alone justifies it if you do any bulk data loading. Test the VACUUM changes in staging first, tune autovacuum settings if needed, and enjoy the faster imports.