Skip to main content
Groq’s Language Processing Units (LPUs) deliver the fastest inference available. Llama 3.3 70B runs at hundreds of tokens per second - ideal for real-time chat and low-latency agentic workflows.

Supported Models

ModelIDContextMax OutputToolsInput $/1MOutput $/1M
Llama 3.3 70Bllama-3.3-70b-versatile128K32KYes$0.59$0.79
Llama 3.1 8B Instantllama-3.1-8b-instant128K8KYes$0.05$0.08
Mixtral 8x7Bmixtral-8x7b-3276832K8KYes$0.24$0.24

Setup

1

Get an API key

Sign up at console.groq.com. Free tier available.
2

Set the environment variable

export GROQ_API_KEY=gsk_...
3

Verify

profclaw doctor --provider groq

Environment Variables

GROQ_API_KEY
string
required
Your Groq API key. Format: gsk_...

Configuration Example

GROQ_API_KEY=gsk_...

Model Aliases

AliasModel
groqllama-3.3-70b-versatile
groq-fastllama-3.1-8b-instant
groq-mixtralmixtral-8x7b-32768

Usage Examples

# Fast general purpose
profclaw chat --model groq "Explain this error message"

# Fastest (8B model)
profclaw chat --model groq-fast "One-line summary of this PR"

Notes

  • Groq is ranked 5th in auto-selection priority after Anthropic, OpenAI, Azure, and Google.
  • llama-3.1-8b-instant is one of the cheapest available models at $0.05/1M input tokens.
  • Groq has a generous free tier with rate limits per day.
  • API is OpenAI-compatible - endpoint: https://api.groq.com/openai/v1