100 Embeddings & Vector Search resources for developers
This resource provides a technical map for implementing embeddings and vector search, focusing on model selection, indexing strategies, and retrieval optimization. It targets developers moving beyond basic prototypes into production-grade semantic search and recommendation systems.
Embedding Models and Inference
- 1
OpenAI text-embedding-3-small
beginnerhighRecommended starting point for general-purpose embeddings. Offers a 1536-dimension vector with a 'dimensions' parameter to reduce size without retraining.
- 2
Voyage AI voyage-2
intermediatehighHigh-performance model optimized for retrieval tasks. Consistently ranks at the top of the MTEB leaderboard for document search.
- 3
Cohere Embed v3
intermediatehighFeatures a 'compression' parameter for binary or int8 quantization, significantly reducing storage costs in vector databases.
- 4
Hugging Face sentence-transformers
intermediatestandardStandard Python library for running open-source models like BGE-large or All-MiniLM-L6-v2 locally or on private infrastructure.
- 5
Jina Embeddings v2
intermediatemediumSupports an 8k token context window, making it suitable for long-form document embedding where chunking might lose context.
- 6
MTEB Benchmark
beginnerstandardMassive Text Embedding Benchmark. Use this to compare model performance across specific tasks like clustering, retrieval, and summarization.
- 7
Local Inference with ONNX Runtime
advancedmediumConvert Hugging Face models to ONNX format to run embeddings in Node.js or C# environments with minimal latency.
- 8
Mixedbread.ai bge-m3
intermediatemediumA versatile model supporting multi-lingual, multi-functional (dense/sparse), and multi-granularity retrieval within a single embedding.
- 9
Infinity Embedding Server
advancedstandardA high-throughput, MIT-licensed inference server for deploying open-source embedding models via a REST API.
- 10
Google Vertex AI Embeddings
beginnermediumEnterprise-grade embedding models integrated with GCP, offering high rate limits and managed infrastructure.
Vector Databases and Storage
- 1
pgvector for PostgreSQL
beginnerhighExtension enabling vector similarity search in Postgres. Best for teams already using RDS or Supabase to avoid adding new infra.
- 2
Qdrant
intermediatehighRust-based vector database. Offers advanced payload filtering and high-performance HNSW indexing for production scale.
- 3
Pinecone Serverless
beginnerhighManaged vector database that scales based on usage. Ideal for applications with unpredictable traffic and large datasets.
- 4
Weaviate
intermediatemediumOpen-source vector DB with a GraphQL interface and native support for multi-modal data (images, video, text).
- 5
ChromaDB
beginnerstandardLightweight, developer-centric database often used for local RAG development and small-scale deployments.
- 6
Milvus
advancedmediumCloud-native vector database designed for billion-scale vector search with decoupled storage and compute.
- 7
LanceDB
intermediatehighServerless, disk-based vector database that stores data in the Lance format, optimized for random access and large-scale AI data.
- 8
FAISS (Facebook AI Similarity Search)
advancedstandardLibrary for efficient similarity search. Essential for building custom indexing pipelines or running in-memory searches.
- 9
Elasticsearch Vector Search
intermediatemediumUtilizes k-NN plugin for vector search. Best for organizations already invested in the ELK stack for logging and search.
- 10
RedisVL
intermediatemediumThe Vector Library for Redis, enabling low-latency vector indexing and search directly within a Redis instance.
Retrieval and Optimization Patterns
- 1
Hybrid Search (BM25 + Vector)
intermediatehighCombines traditional keyword search with semantic vector search using Reciprocal Rank Fusion (RRF) for better accuracy.
- 2
Cohere Rerank
beginnerhighA cross-encoder model used as a second stage to re-score the top results from a vector search for higher precision.
- 3
HNSW (Hierarchical Navigable Small World)
advancedstandardThe industry-standard algorithm for approximate nearest neighbor (ANN) search, balancing speed and recall.
- 4
Product Quantization (PQ)
advancedmediumA compression technique that divides vectors into sub-vectors to reduce memory usage by up to 90% at a slight cost to accuracy.
- 5
Maximal Marginal Relevance (MMR)
intermediatemediumA retrieval strategy that re-ranks results to reduce redundancy and increase the diversity of the returned items.
- 6
Parent-Document Retrieval
intermediatehighTechnique where you embed small chunks for search but return the full parent document to the LLM for better context.
- 7
Query Expansion via LLM
intermediatemediumUsing an LLM to generate multiple versions of a user's query to catch more relevant vectors during retrieval.
- 8
Metadata Pre-filtering
beginnerstandardApplying hard filters (e.g., date > 2023) before performing the vector similarity search to narrow the search space.
- 9
Contextual Compression
intermediatemediumFiltering out irrelevant parts of retrieved documents before passing them to an LLM to save on token costs.
- 10
Multi-Vector Indexing
advancedmediumAssigning multiple vectors to a single document (e.g., summary vector and full text vector) to improve retrieval hits.