portfolio Anshul Bisen
ask my work

S3 Vectors at re:Invent made me reconsider our entire RAG architecture

2 billion vectors per index at 90% cost reduction. Our pgvector pipeline still makes sense, but the calculus for startups without millisecond latency needs is changing.

AWS re:Invent 2025 announced a lot. Aurora DSQL went GA. Lambda Durable Functions landed. But the announcement that made me stop and rethink was S3 Vectors: vector storage and similarity search built directly into S3. Two billion vectors per index. Up to 90% cost reduction compared to specialized vector databases. Pay-per-query pricing.

We had just finished building a pgvector-based RAG pipeline for our internal knowledge base. The timing was either terrible or perfect, depending on how you look at it.

Useful AI looks more like leverage than magic.

I write these AI posts from the far side of the honeymoon phase. It also builds on what I learned earlier in “Opus 4.5 is the first AI model I trust to refactor production code unsupervised.” The interesting question is no longer whether the models are impressive. It is where they meaningfully improve decision quality across real systems like portfolio search, aigw, jarvis, and the review loops around everyday engineering work.

The workflow, not the hype.

Our Current Architecture

The FinanceOps internal RAG pipeline uses pgvector in PostgreSQL. We embed documentation, runbooks, incident reports, and architecture decision records into 768-dimensional vectors using Gemini text-embedding-004. Engineers query it through a Slack bot that retrieves relevant context and answers questions about our systems.

The architecture is simple by design:

  • PostgreSQL with pgvector extension, running on the same database cluster as our application
  • Gemini text-embedding-004 for embedding generation
  • A Node.js service that handles chunking, embedding, and retrieval
  • Cosine similarity search with a HNSW index for approximate nearest neighbor
  • Currently storing about 50,000 vectors, growing at roughly 2,000 per month

It works well. Query latency is 50-100ms. The HNSW index keeps search fast. The operational overhead is minimal because pgvector runs inside our existing PostgreSQL cluster. We are not managing a separate vector database.

What S3 Vectors Changes

S3 Vectors is compelling for a specific profile: teams that need large-scale vector storage without millisecond latency requirements. The pricing model is fundamentally different from self-hosted pgvector or dedicated vector databases.

  • No server to manage. Vectors are stored in S3, which is already the most battle-tested storage service in existence.
  • Pay per query, not per server hour. For workloads with bursty query patterns, this is dramatically cheaper.
  • Scales to 2 billion vectors without capacity planning. Our 50,000 vectors do not need this, but a team building a customer-facing search product might.
  • Integrates natively with the AWS ecosystem. Lambda, Step Functions, and Bedrock can read directly from S3 Vectors without a separate service.

The tradeoff is latency. S3 Vectors has higher query latency than an in-memory HNSW index. For our Slack bot use case, where a 500ms response is indistinguishable from a 100ms response to the user, this might not matter. For a real-time search feature in a customer-facing product, it might.

Should We Migrate

After evaluating S3 Vectors for a week, the answer is no. Not yet. The reasons are pragmatic:

  • Our pgvector setup works. Migrating a working system to save marginal cost is the definition of premature optimization.
  • Our vector count is small. The cost advantage of S3 Vectors is most compelling at millions or billions of vectors. At 50,000, the infrastructure cost of pgvector is negligible.
  • We value the tight integration with our application database. Queries that combine vector similarity with relational filters in a single PostgreSQL query are powerful. Moving vectors to S3 means we lose that join capability.
  • S3 Vectors is new. The service launched at re:Invent. We do not adopt new AWS services in production until they have at least six months of GA stability.

When S3 Vectors Would Be the Right Choice

For a different team at a different stage, S3 Vectors would be compelling:

  • A startup building a customer-facing semantic search product that needs to scale to millions of documents without managing vector database infrastructure.
  • A team that needs vector search as an occasional capability, not a core feature. Pay-per-query means you are not paying for idle capacity.
  • An organization already deep in the AWS ecosystem with Lambda-based architectures. S3 Vectors fits naturally into serverless patterns.
  • Any use case where query latency of 200-500ms is acceptable and cost is more important than speed.

The Broader Trend

The point is judgment, not novelty.

By the time I wrote this, the lesson was bigger than the tool or incident. The job had become setting defaults a team could trust, then proving those defaults in systems like jarvis, alfred, and the portfolio RAG stack. That is leadership work, not just technical taste.

Vector search is being commoditized. Two years ago you needed a specialized database. Now PostgreSQL, S3, and every major cloud provider offers vector capabilities. The competitive advantage is no longer in the vector infrastructure. It is in the embedding quality and retrieval strategy.

S3 Vectors is another signal that the vector database category is collapsing into existing infrastructure. Pinecone, Weaviate, and Qdrant built valuable products for the 2023-2024 window when vector search required specialized tooling. That window is closing as the capability gets absorbed into general-purpose infrastructure. For teams making architecture decisions today, the question is not “which vector database” but “which existing infrastructure can handle vectors well enough.”

For FinanceOps, that answer is still PostgreSQL with pgvector. For many teams starting fresh in 2026, the answer might be S3 Vectors or Aurora with pgvector. The specific tool matters less than the principle: do not adopt specialized infrastructure for a capability that your existing infrastructure handles adequately.

The re:Invent announcement forced us to reconsider assumptions we had baked into our RAG architecture six months earlier. S3 native vector support meant we could eliminate a dedicated vector database from our infrastructure, reducing both cost and operational complexity. But the migration was not straightforward — our existing embedding pipeline assumed a database-centric query model, and adapting it to S3 semantics required rethinking how we indexed and retrieved content. The lesson is that infrastructure announcements create opportunities, but capturing those opportunities requires willingness to revisit architectural decisions that feel settled.