Spread your AI Model load over multiple GPU's

Enabling Ollama to use the GPU and VRAM power of multiple GPU's

TLDR; your using Ollama in Docker and want to spread your model load over multiple GPU’s

Pop the following into your docker-compose file.

1
2
3
        environment:
            - OLLAMA_KEEP_ALIVE=30
            - OLLAMA_SCHED_SPREAD=1  ## add to spread model load over multiple GPUs

Full config example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
name: ollama
services:
    ollama:
        environment:
            - OLLAMA_KEEP_ALIVE=30
            - OLLAMA_SCHED_SPREAD=1
        deploy:
            resources:
                reservations:
                    devices:
                        - driver: nvidia
                          count: all
                          capabilities:
                              - gpu
        volumes:
            - ollama:/root/.ollama
        ports:
            - 11434:11434
        container_name: ollama
        image: ollama/ollama
        restart: always
volumes:
    ollama:
        external: true
        name: ollama
Built with Hugo
Theme Stack designed by Jimmy