Pricing last verified: April 2026. Plans and pricing may change -- check the vendor site for current details.
Pricing Overview
Fireworks AI uses a usage-based, pay-per-token pricing model for serverless inference. There are no fixed monthly subscriptions or seat-based fees -- you pay only for the tokens you consume. New accounts receive $1 in free credits to start experimenting immediately. Pricing scales by model size, with smaller models costing significantly less than larger ones, making Fireworks AI accessible for both prototyping and production workloads.
Beyond serverless inference, Fireworks AI offers on-demand GPU rentals for custom workloads, fine-tuning services for LoRA SFT, batch inference at discounted rates, and image generation endpoints. Cached input tokens receive a 50% discount, rewarding applications that reuse context windows.
Plan Comparison
Fireworks AI prices serverless inference by model size tier rather than traditional subscription plans:
| Model Size Tier | Input Price (per 1M tokens) | Output Price (per 1M tokens) | Notes |
|---|---|---|---|
| Small (<4B parameters) | $0.10 | $0.10 | Lightweight models for simple tasks |
| Medium (4B-16B parameters) | $0.20 | $0.20 | Balanced performance and cost |
| Large (>16B parameters) | $0.90 | $0.90 | High-capability models |
| MoE (0-56B parameters) | $0.50 | $0.50 | Mixture-of-experts architectures |
| DeepSeek V3 | $0.56 | $1.68 | Asymmetric input/output pricing |
| Embeddings | From $0.008 | -- | Text embeddings models |
All tiers benefit from a 50% discount on cached input tokens and a 50% discount on batch inference jobs.
Hidden Costs and Considerations
Fine-tuning costs: LoRA SFT fine-tuning ranges from $0.50 to $10.00 per 1M training tokens depending on model size. Larger base models cost substantially more to fine-tune, so budget accordingly when planning custom model training.
On-demand GPU rental: For workloads that need dedicated hardware, H100 GPUs cost $6.00/hr and B200 GPUs cost $9.00/hr. Running an H100 continuously for a month would cost approximately $4,320 -- a significant commitment compared to serverless inference.
Image generation: FLUX.1 Kontext Pro image generation costs $0.04 per image. At scale, a pipeline generating 10,000 images would cost $400.
No free tier beyond credits: The $1 in free credits for new accounts covers initial testing but will not sustain ongoing production usage. There is no permanent free tier.
Cost Estimates by Team Size
Solo developer: Running a small (<4B) model for prototyping at roughly 5M tokens/month costs $0.50/month at $0.10/1M tokens. The $1 free credit covers the first two months.
Small team (5 engineers): A team using a medium-tier model (4B-16B) for 50M tokens/month spends $10/month on inference. Adding occasional fine-tuning (one LoRA SFT job at $0.50-$2.00/1M training tokens) and batch processing with the 50% discount keeps monthly costs between $15 and $50.
Mid-size team (20 engineers): Heavy usage of large models (>16B) at 200M tokens/month costs $180/month. With DeepSeek V3 usage at 100M input tokens ($56) and 50M output tokens ($84), plus an on-demand H100 GPU for 40 hours/month ($240), total monthly spend reaches $500-$700.
How Fireworks AI Pricing Compares
Fireworks AI sits competitively in the serverless LLM inference market. For small models, Groq offers Llama 8B at $0.05/$0.08 per 1M input/output tokens -- roughly half the cost of Fireworks AI's $0.10/1M tier for comparable model sizes. However, Fireworks AI's model selection is broader.
Together AI prices range from $0.10 to $2.50 per 1M tokens depending on model size, closely matching Fireworks AI's tiered structure. Mistral's Small model costs $0.1/$0.3 per 1M input/output tokens, competitive with Fireworks AI's medium tier.
Fireworks AI's strongest value proposition is the 50% cached input discount and 50% batch inference discount, which neither Groq nor Together AI match at the same level. For workloads with high cache hit rates or tolerance for batch latency, effective per-token costs drop to $0.05/1M for small models -- matching Groq's base rate.