Milvus is the stronger choice for most teams building GenAI and similarity search applications. It offers broader index support, hybrid search, flexible deployment from laptop to distributed cluster, and a managed cloud option. Vald is a better fit for Kubernetes-native teams that value asynchronous indexing and a lightweight, focused vector search engine without the overhead of a full database layer.
| Feature | Vald | Milvus |
|---|---|---|
| ANN Algorithm | NGT (Neighborhood Graph and Tree) by Yahoo Japan Research | Multiple: IVF_FLAT, HNSW, DiskANN, SCANN, IVF_SQ8 |
| Architecture | Kubernetes-native with distributed agents; each agent holds an index shard | Disaggregated storage/compute with fully stateless query, data, and index nodes |
| Deployment Flexibility | Kubernetes required for all deployments; Helm chart provided | Milvus Lite (pip install), Standalone (Docker), Distributed (K8s), Zilliz Cloud (managed) |
| Query Features | Pure vector similarity search with custom gRPC ingress/egress filters | Hybrid search, metadata filtering, multi-vector search, boolean expressions |
| Indexing Strategy | Asynchronous auto-indexing with zero stop-the-world pauses | Segment-based with background compaction; multiple index algorithm choices |
| Best For | Kubernetes-native teams needing a lightweight vector search engine with zero-downtime indexing | Teams building GenAI apps that need hybrid search, flexible deployment, and a mature ecosystem |
| Feature | Vald | Milvus |
|---|---|---|
| Core Architecture | ||
| Primary ANN algorithm | NGT (Neighborhood Graph and Tree) | Multiple: IVF_FLAT, IVF_SQ8, HNSW, SCANN, DiskANN |
| Storage-compute separation | No; agents hold both index and compute | Yes; fully disaggregated since Milvus 2.0 |
| Stateless components | Partially; gateway and filter components are stateless | Fully stateless query, data, and index nodes |
| Coordination service | Kubernetes-native service discovery | etcd for metadata and service coordination |
| Indexing & Search | ||
| Async auto-indexing | Built-in; no stop-the-world pauses during index builds | Segment-based with background compaction |
| Index replication | Automatic multi-agent replication with rebalancing | Replica groups with configurable replication factor |
| Metadata filtering | Custom ingress/egress gRPC filters only | Native boolean expressions on scalar fields |
| Hybrid search | Not natively supported | Combines vector similarity with scalar filtering |
| Multi-vector search | ❌ | Supported across multiple vector fields per collection |
| Deployment & Operations | ||
| Kubernetes-native deployment | Required; Helm chart provided | Supported via Helm and Milvus Operator |
| Lightweight local mode | Not available; requires full Kubernetes cluster | Milvus Lite runs in-process via pip install |
| Managed cloud offering | No managed service available | Zilliz Cloud (serverless and dedicated clusters) |
| Index backup and recovery | Auto-backup to object storage or persistent volumes | Snapshots via Minio/S3 object storage |
| Developer Experience | ||
| SDK languages | Go, Java, Node.js, Python | Python, Java, Go, Node.js, C#, RESTful API |
| API protocol | gRPC only | gRPC and RESTful HTTP |
| Community size | Smaller niche community backed by Yahoo Japan (vdaas) | Large community with 30K+ GitHub stars and active contributors |
Primary ANN algorithm
Storage-compute separation
Stateless components
Coordination service
Async auto-indexing
Index replication
Metadata filtering
Hybrid search
Multi-vector search
Kubernetes-native deployment
Lightweight local mode
Managed cloud offering
Index backup and recovery
SDK languages
API protocol
Community size
Milvus is the stronger choice for most teams building GenAI and similarity search applications. It offers broader index support, hybrid search, flexible deployment from laptop to distributed cluster, and a managed cloud option. Vald is a better fit for Kubernetes-native teams that value asynchronous indexing and a lightweight, focused vector search engine without the overhead of a full database layer.
Choose Vald if:
Choose Vald when your infrastructure is already Kubernetes-native and you need a focused, distributed vector search engine with zero-downtime asynchronous indexing. Vald works well for teams that want fine-grained control over index sharding and replication without the complexity of a full-featured database system.
Choose Milvus if:
Choose Milvus when you need a production-grade vector database with hybrid search, metadata filtering, multiple index algorithms, and a path from local prototyping (Milvus Lite) to distributed deployment or managed cloud (Zilliz Cloud). Milvus is the better choice for GenAI applications, RAG pipelines, and teams that want a mature ecosystem with extensive tooling.
This verdict is based on general use cases. Your specific requirements, existing tech stack, and team expertise should guide your final decision.
Milvus is the stronger choice for RAG (Retrieval Augmented Generation) applications. It provides native metadata filtering that lets you scope vector searches by document source, timestamp, or any scalar attribute, which is essential for grounding LLM responses in relevant context. Milvus also supports hybrid search combining dense vectors with sparse vectors or keyword matching, improving retrieval precision. Vald handles the vector search component effectively but lacks built-in metadata filtering and hybrid search, meaning you would need to build those layers yourself on top of Vald's gRPC filter system.
Yes, both engines are designed to scale to billions of vectors. Vald achieves this by distributing index shards across multiple Kubernetes agents, where each agent holds a different portion of the index. Milvus scales through its disaggregated architecture, where query nodes, data nodes, and index nodes can be scaled independently. In practice, Milvus has been benchmarked more extensively at this scale and offers more tuning options through its multiple index types (IVF, HNSW, DiskANN). Vald's NGT-based approach is highly performant but provides fewer knobs for optimizing recall-vs-latency tradeoffs at extreme scale.
Vald requires Kubernetes for all deployments. It is fundamentally designed around Kubernetes primitives for service discovery, scaling, and agent management, and there is no standalone binary or Docker Compose option. Milvus is more flexible: Milvus Lite runs in-process with a simple pip install for prototyping, Milvus Standalone runs as a single Docker container for small-scale production, and Milvus Distributed uses Kubernetes for full horizontal scaling. If you do not have a Kubernetes cluster, Milvus is the only viable option of the two.
Vald uses Yahoo Japan's NGT algorithm exclusively and performs asynchronous auto-indexing. This means new vectors are searchable without requiring a stop-the-world pause to rebuild the index graph. The distributed index is spread across agents, and rebalancing happens automatically when agents join or leave the cluster. Milvus takes a segment-based approach where data is first buffered in a growing segment, then sealed and indexed in the background. Milvus supports multiple index algorithms including IVF_FLAT for brute-force accuracy, HNSW for low-latency graph search, and DiskANN for cost-efficient billion-scale deployments using SSD storage.
Milvus has a significantly larger community and ecosystem. It is backed by Zilliz, has over 30,000 GitHub stars, and integrates with popular AI frameworks including LangChain, LlamaIndex, Haystack, and Semantic Kernel. Milvus also has dedicated tools like Attu (GUI management), Birdwatcher (diagnostics), and VDBBench (benchmarking). Vald is maintained by the vdaas organization (Yahoo Japan's research division) and has a smaller but focused community. Vald's ecosystem is more minimal, with fewer third-party integrations and tools. For teams that value extensive documentation, community support, and framework integrations, Milvus has a clear advantage.