Resources

100 Embeddings & Vector Search resources for developers

This resource provides a technical map for implementing embeddings and vector search, focusing on model selection, indexing strategies, and retrieval optimization. It targets developers moving beyond basic prototypes into production-grade semantic search and recommendation systems.

Embedding Models and Inference

1
OpenAI text-embedding-3-small
beginnerhigh
Recommended starting point for general-purpose embeddings. Offers a 1536-dimension vector with a 'dimensions' parameter to reduce size without retraining.
2
Voyage AI voyage-2
intermediatehigh
High-performance model optimized for retrieval tasks. Consistently ranks at the top of the MTEB leaderboard for document search.
3
Cohere Embed v3
intermediatehigh
Features a 'compression' parameter for binary or int8 quantization, significantly reducing storage costs in vector databases.
4
Hugging Face sentence-transformers
intermediatestandard
Standard Python library for running open-source models like BGE-large or All-MiniLM-L6-v2 locally or on private infrastructure.
5
Jina Embeddings v2
intermediatemedium
Supports an 8k token context window, making it suitable for long-form document embedding where chunking might lose context.
6
MTEB Benchmark
beginnerstandard
Massive Text Embedding Benchmark. Use this to compare model performance across specific tasks like clustering, retrieval, and summarization.
7
Local Inference with ONNX Runtime
advancedmedium
Convert Hugging Face models to ONNX format to run embeddings in Node.js or C# environments with minimal latency.
8
Mixedbread.ai bge-m3
intermediatemedium
A versatile model supporting multi-lingual, multi-functional (dense/sparse), and multi-granularity retrieval within a single embedding.
9
Infinity Embedding Server
advancedstandard
A high-throughput, MIT-licensed inference server for deploying open-source embedding models via a REST API.
10
Google Vertex AI Embeddings
beginnermedium
Enterprise-grade embedding models integrated with GCP, offering high rate limits and managed infrastructure.

Vector Databases and Storage

1
pgvector for PostgreSQL
beginnerhigh
Extension enabling vector similarity search in Postgres. Best for teams already using RDS or Supabase to avoid adding new infra.
2
Qdrant
intermediatehigh
Rust-based vector database. Offers advanced payload filtering and high-performance HNSW indexing for production scale.
3
Pinecone Serverless
beginnerhigh
Managed vector database that scales based on usage. Ideal for applications with unpredictable traffic and large datasets.
4
Weaviate
intermediatemedium
Open-source vector DB with a GraphQL interface and native support for multi-modal data (images, video, text).
5
ChromaDB
beginnerstandard
Lightweight, developer-centric database often used for local RAG development and small-scale deployments.
6
Milvus
advancedmedium
Cloud-native vector database designed for billion-scale vector search with decoupled storage and compute.
7
LanceDB
intermediatehigh
Serverless, disk-based vector database that stores data in the Lance format, optimized for random access and large-scale AI data.
8
FAISS (Facebook AI Similarity Search)
advancedstandard
Library for efficient similarity search. Essential for building custom indexing pipelines or running in-memory searches.
9
Elasticsearch Vector Search
intermediatemedium
Utilizes k-NN plugin for vector search. Best for organizations already invested in the ELK stack for logging and search.
10
RedisVL
intermediatemedium
The Vector Library for Redis, enabling low-latency vector indexing and search directly within a Redis instance.

Retrieval and Optimization Patterns

1
Hybrid Search (BM25 + Vector)
intermediatehigh
Combines traditional keyword search with semantic vector search using Reciprocal Rank Fusion (RRF) for better accuracy.
2
Cohere Rerank
beginnerhigh
A cross-encoder model used as a second stage to re-score the top results from a vector search for higher precision.
3
HNSW (Hierarchical Navigable Small World)
advancedstandard
The industry-standard algorithm for approximate nearest neighbor (ANN) search, balancing speed and recall.
4
Product Quantization (PQ)
advancedmedium
A compression technique that divides vectors into sub-vectors to reduce memory usage by up to 90% at a slight cost to accuracy.
5
Maximal Marginal Relevance (MMR)
intermediatemedium
A retrieval strategy that re-ranks results to reduce redundancy and increase the diversity of the returned items.
6
Parent-Document Retrieval
intermediatehigh
Technique where you embed small chunks for search but return the full parent document to the LLM for better context.
7
Query Expansion via LLM
intermediatemedium
Using an LLM to generate multiple versions of a user's query to catch more relevant vectors during retrieval.
8
Metadata Pre-filtering
beginnerstandard
Applying hard filters (e.g., date > 2023) before performing the vector similarity search to narrow the search space.
9
Contextual Compression
intermediatemedium
Filtering out irrelevant parts of retrieved documents before passing them to an LLM to save on token costs.
10
Multi-Vector Indexing
advancedmedium
Assigning multiple vectors to a single document (e.g., summary vector and full text vector) to improve retrieval hits.

Embedding Models and Inference

OpenAI text-embedding-3-small

Voyage AI voyage-2

Cohere Embed v3

Hugging Face sentence-transformers

Jina Embeddings v2

MTEB Benchmark

Local Inference with ONNX Runtime

Mixedbread.ai bge-m3

Infinity Embedding Server

Google Vertex AI Embeddings

Vector Databases and Storage

pgvector for PostgreSQL

Qdrant

Pinecone Serverless

Weaviate

ChromaDB

Milvus

LanceDB

FAISS (Facebook AI Similarity Search)

Elasticsearch Vector Search

RedisVL

Retrieval and Optimization Patterns

Hybrid Search (BM25 + Vector)

Cohere Rerank

HNSW (Hierarchical Navigable Small World)

Product Quantization (PQ)

Maximal Marginal Relevance (MMR)

Parent-Document Retrieval

Query Expansion via LLM

Metadata Pre-filtering

Contextual Compression

Multi-Vector Indexing