Cerebras - profClaw

Cerebras uses custom wafer-scale processors to achieve inference speeds that far exceed GPU-based providers. Llama 3.1 70B runs at over 2000 tokens/second - roughly 20x faster than typical cloud GPU inference.

Supported Models

Model	ID	Context	Max Output	Tools	Notes
Llama 3.1 70B	`llama3.1-70b`	128K	8K	Yes	Fastest 70B available
Llama 3.1 8B	`llama3.1-8b`	128K	8K	Yes	Extreme speed

Setup

Get API access

Set the environment variable

export CEREBRAS_API_KEY=csk-...

Verify

profclaw doctor --provider cerebras

Environment Variables

CEREBRAS_API_KEY

string

required

Your Cerebras API key.

Configuration Example

.env
settings.yml

CEREBRAS_API_KEY=csk-...

providers:
  cerebras:
    api_key: "${CEREBRAS_API_KEY}"

Model Aliases

Alias	Model
`cerebras`	llama3.1-70b

Usage Examples

# Ultra-fast streaming response
profclaw chat --model cerebras "Stream this long document analysis"

Notes

API endpoint: https://api.cerebras.ai/v1 (OpenAI-compatible)
Status: Experimental - hardware-specific availability, may have capacity constraints.
Best use case: real-time streaming, bulk generation tasks, low-latency chat.
Cerebras does not support vision or image inputs.

AI Providers Overview - Compare all 37 supported providers
Groq - LPU-based inference, another ultra-fast hardware provider
SambaNova - High-throughput RDU-based inference
profclaw provider - Add and test providers from the CLI

SambaNovaHigh-throughput inference on SambaNova's Reconfigurable Dataflow Units (RDUs). Fast open-source model hosting.

​Supported Models

​Setup

​Environment Variables

​Configuration Example

​Model Aliases

​Usage Examples

​Notes

​Related