Pricing information was last verified on April 29, 2026. Pricing may have changed. Visit Groq for current pricing.

Pricing last verified: April 2026. Plans and pricing may change — check the vendor site for current details.

Pricing Overview

Groq offers usage-based, pay-per-token pricing for LLM inference running on proprietary LPU (Language Processing Unit) hardware. There are no subscriptions, seat licenses, or minimum commitments. You pay only for the tokens you consume across the models available on the platform.

Pricing varies by model size and capability. Smaller models like Llama 3.1 8B cost $0.05 per 1M input tokens and $0.08 per 1M output tokens, while larger models like Llama 3.3 70B cost $0.59/$0.79 per 1M tokens. Groq also supports audio transcription via Whisper v3, priced between $0.04 and $0.111 per hour of audio. This token-based structure makes costs predictable and directly tied to actual usage volume.

Plan Comparison

Groq does not offer tiered subscription plans. Instead, pricing is set per model. Below is a breakdown of current per-token rates:

Model	Input Price (per 1M tokens)	Output Price (per 1M tokens)	Notes
Llama 3.1 8B	$0.05	$0.08	Lowest cost, best for lightweight tasks
Llama 3.3 70B	$0.59	$0.79	Higher capability for complex reasoning
Llama 4 Scout	$0.11	$0.34	Mid-range option with strong performance
Qwen3 32B	$0.29	$0.59	Balanced cost-to-capability ratio
Whisper v3	$0.04-$0.111/hour	N/A	Audio transcription, priced per hour

Output tokens cost more than input tokens across all text models, which is standard in the LLM inference market. The Llama 3.1 8B model stands out as the most affordable option for high-volume, latency-sensitive workloads.

Hidden Costs and Considerations

Groq provides two significant cost-reduction mechanisms. The Batch API offers a 50% discount on standard per-token rates for non-real-time workloads. Prompt caching delivers 50% savings on cached input tokens, which is valuable for applications that reuse long system prompts.

Built-in search tools add costs outside of token pricing. Basic Search runs $5 per 1,000 requests, and Advanced Search costs $8 per 1,000 requests. These fees can add up quickly for retrieval-augmented generation (RAG) workflows. Factor these into your budget if you rely on grounded search results.

Cost Estimates by Team Size

Rough monthly estimates based on Llama 3.3 70B pricing ($0.59/$0.79 per 1M input/output tokens):

Solo developer (roughly 5M input + 2M output tokens/month): approximately $4.53/month. Groq is extremely affordable for individual use.
Small team of 5 (roughly 25M input + 10M output tokens/month): approximately $22.65/month. Well within budget for most startups.
Mid-size team of 20 (roughly 100M input + 40M output tokens/month): approximately $90.60/month. Costs remain low even at moderate scale, though adding search tools could push the total higher.

How Groq Pricing Compares

Groq competes directly on price with other inference providers. For comparable 70B-class models, Together AI charges from $0.10 to $2.50 per 1M tokens depending on model size. Fireworks AI prices models over 16B parameters at $0.90 per 1M tokens, while smaller models under 4B start at $0.10 per 1M tokens. Mistral AI charges $0.1/$0.3 per 1M tokens for Mistral Small and $2/$6 per 1M tokens for Mistral Large.

Groq's Llama 3.3 70B at $0.59/$0.79 per 1M tokens is competitively priced against these alternatives, and its LPU hardware delivers significantly faster inference speeds. However, Groq's model selection is narrower than OpenAI or Together AI, so teams needing specific proprietary models may find the catalog limiting.

Pricing Overview

Plan Comparison

Groq does not offer tiered subscription plans. Instead, pricing is set per model. Below is a breakdown of current per-token rates:

Model

Input Price (per 1M tokens)

Output Price (per 1M tokens)

Notes

Llama 3.1 8B

$0.05

$0.08

Lowest cost, best for lightweight tasks

Llama 3.3 70B

$0.59

$0.79

Higher capability for complex reasoning

Llama 4 Scout

$0.11

$0.34

Mid-range option with strong performance

Qwen3 32B

$0.29

$0.59

Balanced cost-to-capability ratio

Whisper v3

$0.04-$0.111/hour

N/A

Audio transcription, priced per hour

Hidden Costs and Considerations

Cost Estimates by Team Size

Rough monthly estimates based on Llama 3.3 70B pricing ($0.59/$0.79 per 1M input/output tokens):

Solo developer (roughly 5M input + 2M output tokens/month): approximately $4.53/month. Groq is extremely affordable for individual use.

Small team of 5 (roughly 25M input + 10M output tokens/month): approximately $22.65/month. Well within budget for most startups.

Mid-size team of 20 (roughly 100M input + 40M output tokens/month): approximately $90.60/month. Costs remain low even at moderate scale, though adding search tools could push the total higher.

How Groq Pricing Compares

Groq Pricing in 2026

Pricing Overview

Plan Comparison

Hidden Costs and Considerations

Cost Estimates by Team Size

How Groq Pricing Compares

Explore More

Comparisons

Related Pricing Guides

Groq Pricing in 2026

Pricing Overview

Plan Comparison

Hidden Costs and Considerations

Cost Estimates by Team Size

How Groq Pricing Compares

Explore More

Comparisons

Related Pricing Guides