Pricing last verified: April 2026. Plans and pricing may change — check the vendor site for current details.
Pricing Overview
Groq offers usage-based, pay-per-token pricing for LLM inference running on proprietary LPU (Language Processing Unit) hardware. There are no subscriptions, seat licenses, or minimum commitments. You pay only for the tokens you consume across the models available on the platform.
Pricing varies by model size and capability. Smaller models like Llama 3.1 8B cost $0.05 per 1M input tokens and $0.08 per 1M output tokens, while larger models like Llama 3.3 70B cost $0.59/$0.79 per 1M tokens. Groq also supports audio transcription via Whisper v3, priced between $0.04 and $0.111 per hour of audio. This token-based structure makes costs predictable and directly tied to actual usage volume.
Plan Comparison
Groq does not offer tiered subscription plans. Instead, pricing is set per model. Below is a breakdown of current per-token rates:
| Model | Input Price (per 1M tokens) | Output Price (per 1M tokens) | Notes |
|---|---|---|---|
| Llama 3.1 8B | $0.05 | $0.08 | Lowest cost, best for lightweight tasks |
| Llama 3.3 70B | $0.59 | $0.79 | Higher capability for complex reasoning |
| Llama 4 Scout | $0.11 | $0.34 | Mid-range option with strong performance |
| Qwen3 32B | $0.29 | $0.59 | Balanced cost-to-capability ratio |
| Whisper v3 | $0.04-$0.111/hour | N/A | Audio transcription, priced per hour |
Output tokens cost more than input tokens across all text models, which is standard in the LLM inference market. The Llama 3.1 8B model stands out as the most affordable option for high-volume, latency-sensitive workloads.
Hidden Costs and Considerations
Groq provides two significant cost-reduction mechanisms. The Batch API offers a 50% discount on standard per-token rates for non-real-time workloads. Prompt caching delivers 50% savings on cached input tokens, which is valuable for applications that reuse long system prompts.
Built-in search tools add costs outside of token pricing. Basic Search runs $5 per 1,000 requests, and Advanced Search costs $8 per 1,000 requests. These fees can add up quickly for retrieval-augmented generation (RAG) workflows. Factor these into your budget if you rely on grounded search results.
Cost Estimates by Team Size
Rough monthly estimates based on Llama 3.3 70B pricing ($0.59/$0.79 per 1M input/output tokens):
- Solo developer (roughly 5M input + 2M output tokens/month): approximately $4.53/month. Groq is extremely affordable for individual use.
- Small team of 5 (roughly 25M input + 10M output tokens/month): approximately $22.65/month. Well within budget for most startups.
- Mid-size team of 20 (roughly 100M input + 40M output tokens/month): approximately $90.60/month. Costs remain low even at moderate scale, though adding search tools could push the total higher.
How Groq Pricing Compares
Groq competes directly on price with other inference providers. For comparable 70B-class models, Together AI charges from $0.10 to $2.50 per 1M tokens depending on model size. Fireworks AI prices models over 16B parameters at $0.90 per 1M tokens, while smaller models under 4B start at $0.10 per 1M tokens. Mistral AI charges $0.1/$0.3 per 1M tokens for Mistral Small and $2/$6 per 1M tokens for Mistral Large.
Groq's Llama 3.3 70B at $0.59/$0.79 per 1M tokens is competitively priced against these alternatives, and its LPU hardware delivers significantly faster inference speeds. However, Groq's model selection is narrower than OpenAI or Together AI, so teams needing specific proprietary models may find the catalog limiting.