Turbopuffer is a serverless vector and full-text search database built from first principles on object storage. In this Turbopuffer review, we evaluate a platform that separates compute and storage to deliver sub-10ms query latency while keeping costs up to 10x lower than traditional vector databases. With 2.5 trillion documents and 10 million writes per second handled in production, Turbopuffer has earned the trust of companies like Cursor, Anthropic, Notion, Linear, Atlassian, Ramp, Grammarly, and Superhuman.
Overview
Turbopuffer takes a fundamentally different approach to vector search. Instead of storing all vectors on expensive SSDs, it uses object storage (S3, GCS, or Azure Blob) as the source of truth and layers an intelligent caching system on top. Hot data gets promoted to NVMe SSDs and RAM based on access patterns, while cold data stays on cheap object storage at roughly $20 per TB per month.
The company calls this the "pufferfish effect" — data inflates from object storage to NVMe to RAM as query frequency increases, then deflates back to cheaper tiers when access drops. This tiered architecture means teams pay for the storage tier their data actually occupies, not the most expensive tier available.
In production, Turbopuffer processes over 10,000 queries per second globally and supports write throughput exceeding 10 million writes per second at 32 GB/s. Individual namespaces can hold up to 500 million documents at 2 TB, and the platform places no global limit on total documents or namespaces.
Key Features and Architecture
Turbopuffer's architecture centers on the separation of compute and storage, with object storage as the durable layer and NVMe/RAM as the acceleration layer. Key capabilities include:
- Vector Search: Sub-10ms p50 latency on warm namespaces with support for billions of vectors. The SPFresh centroid-based index minimizes roundtrips to storage by identifying relevant clusters before fetching data.
- Full-Text Search: BM25-style keyword search with p50 latency around 343ms and p99 around 554ms for 1 million documents at 768 dimensions.
- Hybrid Search: Combines vector similarity with full-text search and metadata filtering in a single query, enabling precise results for complex AI retrieval tasks.
- Metadata Filtering: Filter queries by arbitrary attributes without sacrificing search performance.
- Automatic Scaling: Serverless architecture scales compute independently from storage. No capacity planning or cluster management required.
- Multi-Tenancy: Built-in namespace isolation supports millions of tenants. Over 100 million namespaces have been observed in production.
- Multi-Vector Columns: Support for multiple vector columns per namespace, with filterable attributes billed once per vector column.
- Namespace Pinning: Pin frequently accessed namespaces for predictable performance, billed in GB-hours instead of per-query pricing.
The write path sends data to a write-ahead log on object storage first, then asynchronously indexes it. Write latency sits around 285ms p50 with throughput exceeding 10,000 vectors per second per namespace.
Ideal Use Cases
Turbopuffer excels in scenarios where data access patterns are bursty and a large portion of vectors sit cold most of the time:
- Code Search and IDE Integration: Cursor uses Turbopuffer to index millions of developer codebases. Most embeddings sit idle between coding sessions, making the tiered storage model dramatically cheaper than always-on SSD databases. First queries in a session take around 300ms as data loads from object storage, with subsequent queries hitting cache at sub-10ms.
- Multi-Tenant RAG Systems: Applications serving thousands of tenants benefit from Turbopuffer's namespace isolation and per-namespace billing. Each tenant gets an isolated namespace without the overhead of managing separate database instances.
- Semantic Search at Scale: Notion, Linear, and Grammarly use Turbopuffer for production semantic search across large document corpora where cost efficiency matters at scale.
- AI Application Backends: Any system connecting LLMs with large amounts of fresh data — recommendation engines, document retrieval, knowledge bases — where query patterns are uneven and cold storage costs would otherwise dominate the budget.
Turbopuffer is less suited for workloads that require guaranteed sub-10ms latency on every query regardless of access pattern, or applications that need immediate write-then-read consistency with sub-second write latency.
Pricing and Licensing
Turbopuffer offers three pricing tiers:
- Launch — $64 per month: Includes all database features, multi-tenancy, SOC2 report, GDPR-ready DPA, community Slack and email support.
- Scale — $256 per month: Everything in Launch plus HIPAA-ready BAA, Single Sign-On (SSO), a private Slack channel, and support hours from 8 to 5.
- Enterprise — Contact sales: Adds single-tenancy, BYOC (bring your own cloud), CMEK per namespace, private networking, 24/7 support with SLA, and 99.95% uptime SLA.
Usage-based billing applies on top of the tier minimums. Storage costs roughly $0.02 per GB on object storage. Query pricing is per GB queried plus returned, with significant volume discounts: an 80% marginal discount when queried data is between 32 GB and 128 GB, and a 96% discount above 128 GB. Base queried data rate is $1 per PB after a February 2026 reduction from $5 per PB. Write costs are per GB written with batch discounts up to 50%.
For comparison, Pinecone Serverless charges $0.33 per GB for storage versus Turbopuffer's $0.02 per GB. Cursor reported a 95% cost reduction after migrating to Turbopuffer. At 100 million to 1 billion vectors with 1536 dimensions, Turbopuffer typically runs $500 to $2,000 per month versus $5,000 to $20,000 per month on Pinecone Serverless.
Pros and Cons
Pros:
- Dramatic cost savings: 10x cheaper than traditional vector databases for most workloads. Object storage at $0.02 per GB versus $0.33 per GB on competitors.
- Massive proven scale: 2.5 trillion documents and 10 million writes per second in production, trusted by Cursor, Anthropic, Notion, and Atlassian.
- Low warm-state latency: Sub-10ms p50 query latency on cached data, with p99 around 35ms for vector search.
- True serverless: No clusters to manage, no capacity planning. Scales automatically from zero to billions of vectors.
- Strong compliance: SOC2, GDPR-ready DPA across all tiers, HIPAA-ready BAA on Scale and Enterprise plans.
- Hybrid search: Combines vector, full-text, and metadata filtering in a single query.
Cons:
- Cold query latency: First queries against uncached namespaces can take 300ms to 4 seconds at p99. Applications requiring consistent sub-10ms latency on every query may struggle.
- Minimum monthly commitment: All plans require at least $64 per month. No free tier is available for prototyping.
- Write latency tradeoff: Write-ahead log architecture means 285ms p50 write latency with asynchronous indexing, slower than databases offering near-real-time indexing.
- Query billing complexity: Queried bytes are billed based on namespace size, not the data a query logically touches. Large namespaces with frequent queries can produce higher-than-expected bills.
Alternatives and How It Compares
The vector database space offers several alternatives, each with different tradeoffs:
- Pinecone: The most established managed vector database. SSD-first architecture delivers consistently low latency regardless of access patterns, but storage costs roughly $0.33 per GB versus Turbopuffer's $0.02 per GB. Better for hot workloads; significantly more expensive for cold data. Offers a free tier for prototyping.
- Qdrant: Open-source vector search engine written in Rust with a freemium cloud offering. Strong performance and a self-hosted option for teams that want full control. Good choice for teams with ops capacity who want to avoid vendor lock-in.
- Milvus: Open-source vector database built for GenAI applications, scaling to tens of billions of vectors. Enterprise pricing through Zilliz Cloud, which offers dedicated clusters with predictable billing and sub-100ms cold start latency.
- ChromaDB: AI-native open-source embedding database focused on LLM applications. Usage-based cloud pricing starting free. Best suited for smaller-scale prototyping and applications that prioritize developer experience over raw scale.
- Marqo: Enterprise vector search optimized for search conversion using clickstream and event data. Contact-based pricing aimed at e-commerce and personalization use cases.
Turbopuffer's primary differentiation is economic: for workloads where most data is cold, the object storage architecture saves an order of magnitude on costs. Teams running hot workloads with strict latency SLAs may find Pinecone or Milvus (via Zilliz Cloud) more predictable. Teams wanting self-hosted flexibility should evaluate Qdrant or Milvus directly.
Frequently Asked Questions
Is Turbopuffer free?
Turbopuffer offers limited free usage for development and testing. Production pricing is pay-per-use based on storage, writes, and queries.
How fast is Turbopuffer?
Turbopuffer achieves sub-10ms query latency for cached data and sub-100ms for cold queries. The caching layer automatically adapts to access patterns, warming frequently accessed data for consistent low latency.
Is Turbopuffer open source?
No, Turbopuffer is a proprietary managed service. There is no self-hosted option. For open-source alternatives, consider Milvus, Qdrant, or pgvector.
How does Turbopuffer compare to Pinecone?
Both are managed serverless vector databases. Turbopuffer is significantly cheaper due to its object-storage architecture ($0.40/GB vs Pinecone's higher storage costs). Pinecone is more mature with more features, wider adoption, and better documentation. Turbopuffer for cost-sensitive workloads; Pinecone for production-grade reliability and features.
What regions does Turbopuffer support?
Turbopuffer is currently available on AWS with support for multiple US and EU regions. Multi-cloud support (GCP, Azure) is on the roadmap. For multi-cloud requirements today, consider Pinecone or Zilliz which support AWS, GCP, and Azure.
