Skip to main content

Overview

profClaw supports local LLM inference through Ollama and LM Studio. Run AI agents entirely on your own hardware with no API keys or cloud dependencies.

Ollama Setup

1

Install Ollama

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh
2

Pull a Model

# General purpose
ollama pull llama3.2

# Coding focused
ollama pull codellama:13b

# Small and fast
ollama pull phi3:mini
3

Configure profClaw

export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_MODEL=llama3.2
Or in settings.yml:
providers:
  default: ollama
  ollama:
    baseUrl: http://localhost:11434
    model: llama3.2
4

Start and Test

ollama serve &
profclaw serve
profclaw chat
> Hello, are you running locally?

LM Studio Setup

1

Install LM Studio

Download from lmstudio.ai. Available for macOS, Windows, and Linux.
2

Download a Model

Open LM Studio, browse the model catalog, and download a model (e.g., Llama 3.2, Mistral, Phi-3).
3

Start the Server

In LM Studio, go to the Local Server tab and click Start Server. Default port is 1234.
4

Configure profClaw

export LMSTUDIO_BASE_URL=http://localhost:1234
export LMSTUDIO_MODEL=your-model-name
ModelSizeBest ForVRAM Needed
Llama 3.2 3B2GBQuick tasks, chat4GB
Llama 3.2 8B4.7GBGeneral purpose8GB
CodeLlama 13B7.4GBCode generation16GB
Mistral 7B4.1GBBalanced performance8GB
Phi-3 Mini2.2GBEdge devices4GB
DeepSeek Coder V28.9GBCode tasks16GB

Hybrid Setup

Use local models for simple tasks and cloud providers for complex ones:
providers:
  default: ollama
  ollama:
    baseUrl: http://localhost:11434
    model: llama3.2
  anthropic:
    apiKey: ${ANTHROPIC_API_KEY}
    model: claude-sonnet-4-6
Switch providers per conversation:
profclaw chat --provider anthropic
profclaw chat --provider ollama

Docker with Ollama

Run both profClaw and Ollama in Docker:
services:
  profclaw:
    image: profclaw/profclaw:latest
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      - OLLAMA_MODEL=llama3.2
    depends_on:
      - ollama

  ollama:
    image: ollama/ollama:latest
    volumes:
      - ollama-models:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]  # GPU passthrough

volumes:
  ollama-models:

Performance Tips

Ollama automatically uses GPU if available. Check with ollama list - GPU-accelerated models show higher tokens/sec.
Local models have smaller context windows than cloud models. Set POOL_TIMEOUT_MS higher for larger contexts.
Use quantized models (Q4_K_M, Q5_K_M) for better speed with minimal quality loss:
ollama pull llama3.2:q4_k_m