Building AI-Powered Apps with RAG and PgVector: A Practical Guide

A hands-on walkthrough of Retrieval-Augmented Generation (RAG) using PgVector and Node.js — from chunking strategy to production deployment.

AIRAGNode.jsTypeScriptPostgreSQLPgVector

Retrieval-Augmented Generation (RAG) is one of the most impactful AI patterns you can implement today — and it doesn’t require a PhD or a team of ML engineers. In this guide, I’ll walk you through a production-grade RAG pipeline using Node.js, TypeScript, and PgVector (PostgreSQL’s native vector extension).

What Is RAG and Why Does It Matter?

LLMs like GPT-4 are powerful but have a critical limitation: their knowledge is frozen at training time. RAG solves this by retrieving relevant context from your own data and injecting it into the prompt at inference time.

The result: an AI that answers questions about your documents, database, or knowledge base — accurately and without hallucination.

The Architecture

User Query → Embed Query → Vector Search (PgVector) → Inject Top-K Chunks → LLM → Answer

This pipeline has three moving parts:

  1. Ingestion — chunk documents, embed them, store in Postgres with PgVector
  2. Retrieval — embed the user query, do a cosine similarity search
  3. Generation — inject retrieved chunks into the LLM prompt

Setting Up PgVector

PgVector is a Postgres extension that adds a native vector type and efficient similarity search operators. Enable it in your database:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE document_chunks (
  id SERIAL PRIMARY KEY,
  content TEXT NOT NULL,
  embedding vector(1536),  -- OpenAI text-embedding-3-small dimensions
  source TEXT,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- IVFFlat index for approximate nearest-neighbor search at scale
CREATE INDEX ON document_chunks USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

Embedding and Ingestion (TypeScript)

import OpenAI from 'openai';
import { Pool } from 'pg';

const openai = new OpenAI();
const pool = new Pool({ connectionString: process.env.DATABASE_URL });

async function embedAndStore(content: string, source: string): Promise<void> {
  const { data } = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: content,
  });

  const embedding = data[0].embedding;

  await pool.query(
    'INSERT INTO document_chunks (content, embedding, source) VALUES ($1, $2, $3)',
    [content, JSON.stringify(embedding), source]
  );
}

Retrieval

async function retrieveTopK(query: string, k = 5): Promise<string[]> {
  const { data } = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: query,
  });

  const embedding = data[0].embedding;

  const { rows } = await pool.query<{ content: string }>(
    `SELECT content
     FROM document_chunks
     ORDER BY embedding <=> $1
     LIMIT $2`,
    [JSON.stringify(embedding), k]
  );

  return rows.map((r) => r.content);
}

Generation

async function answerWithContext(question: string): Promise<string> {
  const chunks = await retrieveTopK(question);
  const context = chunks.join('\n\n---\n\n');

  const { choices } = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      {
        role: 'system',
        content: `You are a helpful assistant. Answer based only on the context provided.
                  If the answer is not in the context, say so.`,
      },
      {
        role: 'user',
        content: `Context:\n${context}\n\nQuestion: ${question}`,
      },
    ],
  });

  return choices[0].message.content ?? '';
}

Production Considerations

  • Chunking strategy matters. Overlapping chunks (e.g. 512 tokens with 64-token overlap) dramatically improve retrieval quality.
  • Hybrid search. Combine vector search with full-text search (tsvector) for best-of-both results.
  • Metadata filtering. Store document source, date, and category alongside embeddings so you can scope retrieval.
  • Caching. Cache embeddings for repeated queries to cut costs.

Conclusion

RAG is the right tool when you need an LLM to reason over private, domain-specific data. With PgVector in Postgres, you already have the infrastructure if you’re running a relational database — no separate vector database required.

If you’re building an AI feature and need a production-grade implementation, get in touch.