Enabling Ollama to use the GPU and VRAM power of multiple GPU's
TLDR; your using Ollama in Docker and want to spread your model load over multiple GPU’s
Pop the following into your docker-compose
file.
1
2
3
|
environment:
- OLLAMA_KEEP_ALIVE=30
- OLLAMA_SCHED_SPREAD=1 ## add to spread model load over multiple GPUs
|
Full config example.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
name: ollama
services:
ollama:
environment:
- OLLAMA_KEEP_ALIVE=30
- OLLAMA_SCHED_SPREAD=1
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities:
- gpu
volumes:
- ollama:/root/.ollama
ports:
- 11434:11434
container_name: ollama
image: ollama/ollama
restart: always
volumes:
ollama:
external: true
name: ollama
|