Overview
profClaw supports local LLM inference through Ollama and LM Studio. Run AI agents entirely on your own hardware with no API keys or cloud dependencies.Ollama Setup
LM Studio Setup
Install LM Studio
Download from lmstudio.ai. Available for macOS, Windows, and Linux.
Download a Model
Open LM Studio, browse the model catalog, and download a model (e.g., Llama 3.2, Mistral, Phi-3).
Start the Server
In LM Studio, go to the Local Server tab and click Start Server. Default port is 1234.
Recommended Models
| Model | Size | Best For | VRAM Needed |
|---|---|---|---|
| Llama 3.2 3B | 2GB | Quick tasks, chat | 4GB |
| Llama 3.2 8B | 4.7GB | General purpose | 8GB |
| CodeLlama 13B | 7.4GB | Code generation | 16GB |
| Mistral 7B | 4.1GB | Balanced performance | 8GB |
| Phi-3 Mini | 2.2GB | Edge devices | 4GB |
| DeepSeek Coder V2 | 8.9GB | Code tasks | 16GB |
Hybrid Setup
Use local models for simple tasks and cloud providers for complex ones:Docker with Ollama
Run both profClaw and Ollama in Docker:Performance Tips
GPU Acceleration
GPU Acceleration
Ollama automatically uses GPU if available. Check with
ollama list - GPU-accelerated models show higher tokens/sec.Context Length
Context Length
Local models have smaller context windows than cloud models. Set
POOL_TIMEOUT_MS higher for larger contexts.Quantization
Quantization
Use quantized models (Q4_K_M, Q5_K_M) for better speed with minimal quality loss: