Overview
Kedro was created by QuantumBlack (McKinsey's AI division) in 2019 and open-sourced under the Apache 2.0 license. It joined the Linux Foundation's LF AI & Data in 2021, ensuring vendor-neutral governance. Kedro has 10K+ GitHub stars and is used by organizations including McKinsey, NASA, Rolls-Royce, and numerous data science teams. The framework enforces software engineering best practices in data science projects: modular code structure, configuration-driven data access, reproducible pipelines, and automated documentation. Kedro provides a CLI for project scaffolding (kedro new), a data catalog for abstracting data sources, a pipeline API for defining workflows, and Kedro-Viz for interactive pipeline visualization. The framework integrates with deployment targets including Airflow, Kubeflow, Vertex AI, and AWS Step Functions via plugins.
Key Features and Architecture
Data Catalog
The data catalog is a YAML-based registry of all data sources in your project. Define datasets once in catalog.yml — CSV files, Parquet, SQL databases, S3 buckets, API endpoints — and access them by name in your code. This eliminates hardcoded paths, centralizes configuration, and makes it trivial to swap data sources between environments (development vs production). The catalog supports 100+ dataset types via built-in connectors and custom implementations.
Pipeline API
Define ML pipelines as a series of Python functions connected by their inputs and outputs. Kedro resolves the execution order automatically based on data dependencies — no explicit DAG definition needed. Pipelines are modular: compose small pipelines into larger ones, and reuse pipeline components across projects. The kedro run command executes pipelines with automatic dependency resolution.
Project Template
kedro new scaffolds a standardized project structure with separate directories for data, notebooks, source code, configuration, and tests. This structure enforces separation of concerns and makes projects navigable by any team member. The template includes configuration management for different environments (base, local, production).
Kedro-Viz
An interactive web-based visualization tool that renders your pipeline as a flowchart. Kedro-Viz shows data dependencies, node execution status, and dataset metadata. It's useful for understanding complex pipelines and communicating workflow structure to stakeholders.
Deployment Plugins
Kedro pipelines can be deployed to production orchestrators via plugins: kedro-airflow generates Airflow DAGs, kedro-kubeflow creates Kubeflow pipelines, kedro-vertexai deploys to Google Vertex AI, and kedro-docker containerizes projects. This means you develop locally with Kedro and deploy to your production orchestrator without rewriting pipeline code.
Ideal Use Cases
Team Data Science Projects
Teams of 3+ data scientists working on the same ML project who need consistent code structure and practices. Kedro's standardized template and data catalog ensure everyone follows the same patterns, making code reviews easier and onboarding faster. McKinsey uses Kedro across hundreds of client engagements for this reason.
Production ML Pipelines
Organizations that need to move data science code from notebooks to production with minimal refactoring. Kedro's modular pipeline structure and configuration management make the transition from development to production straightforward. Deployment plugins handle the translation to production orchestrators.
Regulated Industries
Organizations in healthcare, finance, or government that need auditable, reproducible data pipelines. Kedro's data catalog provides a single source of truth for data lineage, and the pipeline structure ensures reproducibility. The standardized project template makes compliance audits easier.
Data Engineering Workflows
Teams building data transformation pipelines that need modularity and testability. Kedro's node-based architecture makes each transformation step independently testable, and the data catalog abstracts storage details from business logic.
Pricing and Licensing
Kedro is open-source and free to use, with infrastructure costs varying by deployment scale. When evaluating total cost of ownership, consider not just the subscription fee but also infrastructure costs, implementation time, and ongoing maintenance. Most tools in this category range from $0 for free tiers to $50-$500/month for professional plans, with enterprise pricing starting at $1,000/month. Teams should request detailed pricing based on their specific usage patterns before committing.
| Option | Cost | Details |
|---|---|---|
| Kedro Open Source | $0 | Apache 2.0 license, full framework |
| Kedro-Viz | $0 | Open-source pipeline visualization |
| QuantumBlack Consulting | Custom | McKinsey's AI consulting using Kedro |
| Community Plugins | $0 | Open-source deployment plugins (Airflow, Kubeflow, etc.) |
Kedro is completely free and open-source with no paid tiers or commercial versions. The only costs are the infrastructure to run your pipelines (local machines, cloud compute, or production orchestrators). For comparison, Metaflow is also free but Outerbounds charges ~$500/month for managed features. Kedro's zero-cost model makes it one of the most accessible ML frameworks — install with pip install kedro and start building. QuantumBlack (McKinsey) offers consulting services that use Kedro, but the framework itself has no commercial licensing.
Pros and Cons
Pros
- Data catalog — YAML-based data source registry eliminates hardcoded paths; 100+ built-in dataset connectors
- Standardized structure —
kedro newscaffolds consistent project layout; makes code navigable and maintainable - Pipeline visualization — Kedro-Viz renders interactive flowcharts of your pipeline for understanding and communication
- Deployment flexibility — plugins for Airflow, Kubeflow, Vertex AI, and Docker; develop locally, deploy anywhere
- 10K+ GitHub stars — active community, Linux Foundation governance, extensive documentation
- Completely free — no paid tiers, no commercial version; Apache 2.0 license
Cons
- Opinionated structure — the standardized template can feel restrictive for quick experiments or small projects
- Learning curve — understanding the data catalog, pipeline API, and project structure takes time
- No experiment tracking — Kedro doesn't track metrics or hyperparameters; need MLflow or W&B for that
- No model serving — pipeline framework only; need BentoML or Seldon for model deployment
- Overhead for small projects — the project structure and configuration add overhead that isn't justified for simple scripts
Getting Started
Getting started takes under 10 minutes. Visit the official website to create an account or download the application. The onboarding process walks through initial configuration, and most users are productive within their first session. For teams evaluating against alternatives, we recommend a 2-week trial period to assess whether the feature set aligns with workflow requirements. Documentation, community forums, and support channels are available to help with setup and advanced configuration. Enterprise customers can request a guided onboarding session with the vendor's solutions team.
Alternatives and How It Compares
The competitive landscape in this category is active, with both open-source and commercial options available. When comparing alternatives, focus on integration depth with your existing stack, pricing at your expected scale, and the quality of documentation and community support. Each tool makes different trade-offs between ease of use, flexibility, and enterprise features.
Metaflow
Metaflow (Netflix) provides Python-native ML workflows with automatic versioning and cloud scaling. Metaflow for seamless laptop-to-cloud scaling; Kedro for enforcing code quality and project structure. Metaflow is more infrastructure-focused; Kedro is more code-quality-focused.
MLflow
MLflow provides experiment tracking and model registry. Kedro provides pipeline structure and data catalog. They are complementary — use Kedro for pipeline organization and MLflow for experiment tracking. The kedro-mlflow plugin integrates both.
Prefect
Prefect provides modern workflow orchestration with a Python API. Prefect for production workflow scheduling and monitoring; Kedro for data science project structure and development. Kedro pipelines can be deployed to Prefect via plugins.
Cookiecutter Data Science
Cookiecutter Data Science provides a project template for data science. Kedro goes further with a data catalog, pipeline API, and visualization. Kedro is a framework; Cookiecutter is just a template.
Frequently Asked Questions
Is Kedro free?
Yes, Kedro is completely free and open-source under the Apache 2.0 license. There are no paid tiers or commercial versions.
Who created Kedro?
Kedro was created by QuantumBlack, McKinsey's AI division. It is now part of the Linux Foundation's LF AI & Data, ensuring vendor-neutral governance.
Can Kedro deploy to Airflow?
Yes, the `kedro-airflow` plugin converts Kedro pipelines into Airflow DAGs. Similar plugins exist for Kubeflow, Vertex AI, AWS Step Functions, and Prefect. This means you develop locally with Kedro and deploy to your production orchestrator without rewriting pipeline code.
