Docker
Docker Images are created for each release. They come with all the benchmarks installed and the necessary datasets. No additional downloads are necessary.
CUDA
Requirements
NVIDIA driver
Usage
The commands below will download the lastest cuda container and run milabench right away,
storing the results inside the results
folder on the host machine:
# Choose the image you want to use
export MILABENCH_IMAGE=ghcr.io/mila-iqia/milabench:cuda-nightly
# Pull the image we are going to run
docker pull $MILABENCH_IMAGE
# Run milabench
docker run -it --rm --ipc=host --gpus=all \
-v $(pwd)/results:/milabench/envs/runs \
$MILABENCH_IMAGE \
milabench run
--ipc=host
removes shared memory restrictions, but you can also set --shm-size
to a high value instead (at least 8G
, possibly more).
Each run should store results in a unique directory under results/
on the host machine. To generate a readable report of the results you can run:
# Show Performance Report
docker run -it --rm \
-v $(pwd)/results:/milabench/envs/runs \
$MILABENCH_IMAGE \
milabench report --runs /milabench/envs/runs
ROCM
Requirements
rocm
docker
Usage
For ROCM the usage is similar to CUDA, but you must use a different image and the Docker options are a bit different:
# Choose the image you want to use
export MILABENCH_IMAGE=ghcr.io/mila-iqia/milabench:rocm-nightly
# Pull the image we are going to run
docker pull $MILABENCH_IMAGE
# Run milabench
docker run -it --rm --ipc=host \
--device=/dev/kfd --device=/dev/dri \
--security-opt seccomp=unconfined --group-add video \
-v /opt/amdgpu/share/libdrm/amdgpu.ids:/opt/amdgpu/share/libdrm/amdgpu.ids \
-v /opt/rocm:/opt/rocm \
-v $(pwd)/results:/milabench/envs/runs \
$MILABENCH_IMAGE \
milabench run
For the performance report, it is the same command:
# Show Performance Report
docker run -it --rm \
-v $(pwd)/results:/milabench/envs/runs \
$MILABENCH_IMAGE \
milabench report --runs /milabench/envs/runs
Multi-node benchmark
There are currently two multi-node benchmarks, opt-1_3b-multinode
(data-parallel) and opt-6_7b-multinode
(model-parallel, that model is too large to fit on a single GPU). Here is how to run them:
Set up two or more machines that can see each other on the network. Suppose there are two and their addresses are:
manager-node
⬅ this is the node you will launch the job on
worker-node
docker pull
the image on both nodes.Prior to running the benchmark, create a SSH key pair on
manager-node
and set up public key authentication to the other nodes (in this case,worker-node
).Write an override file that will tell milabench about the network (see below). Note that you will need to copy/paste the same configuration for both multinode tests.
On
manager-node
, executemilabench run
via Docker.
Mount the private key at
/milabench/id_milabench
in the containerUse
--override "$(cat overrides.yaml)"
to pass the overrides
Example YAML configuration (overrides.yaml
):
# Name of the benchmark. You can also override values in other benchmarks.
opt-6_7b-multinode:
# Docker image to use on the worker nodes (should be same as the manager)
docker_image: "ghcr.io/mila-iqia/milabench:cuda-nightly"
# The user on worker-node that public key auth is set up for
worker_user: "username"
# Address of the manager node from the worker nodes
manager_addr: "manager-node"
# Addresses of the worker nodes (do not include the manager node,
# although it is also technically a worker node)
worker_addrs:
- "worker-node"
# Make sure that this is equal to length(worker_addrs) + 1
num_machines: 2
capabilities:
# Make sure that this is ALSO equal to length(worker_addrs) + 1
nodes: 2
opt-1_3b-multinode:
# Copy the contents of the opt-6_7b-multinode section without any changes.
docker_image: "ghcr.io/mila-iqia/milabench:cuda-nightly"
worker_user: "username"
manager_addr: "manager-node"
worker_addrs:
- "worker-node"
num_machines: 2
capabilities:
nodes: 2
Then, the command should look like this:
# On manager-node:
# Change if needed
export SSH_KEY_FILE=$HOME/.ssh/id_rsa
docker run -it --rm --gpus all --network host --ipc=host --privileged \
-v $SSH_KEY_FILE:/milabench/id_milabench \
-v $(pwd)/results:/milabench/envs/runs \
$MILABENCH_IMAGE \
milabench run --override "$(cat overrides.yaml)" \
--select multinode
The last line (--select multinode
) specifically selects the multi-node benchmarks. Omit that line to run all benchmarks.
If you need to use more than two nodes, edit or copy overrides.yaml
and simply add the other nodes’ addresses in worker_addrs
and adjust num_machines
and capabilities.nodes
accordingly. For example, for 4 nodes:
opt-6_7b-multinode:
docker_image: "ghcr.io/mila-iqia/milabench:cuda-nightly"
worker_user: "username"
manager_addr: "manager-node"
worker_addrs:
- "worker-node1"
- "worker-node2"
- "worker-node3"
num_machines: 4
capabilities:
nodes: 4
Note
The multi-node benchmark is sensitive to network performance. If the mono-node benchmark opt-6_7b
is significantly faster than opt-6_7b-multinode
(e.g. processes more than twice the items per second), this likely indicates that Infiniband is either not present or not used. (It is not abnormal for the multinode benchmark to perform a bit worse than the mono-node benchmark since it has not been optimized to minimize the impact of communication costs.)
Even if Infiniband is properly configured, the benchmark may fail to use it unless the --privileged
flag is set when running the container.
Building images
Images can be built locally for prototyping and testing.
docker build -f docker/Dockerfile-cuda -t milabench:cuda-nightly --build-arg CONFIG=standard.yaml .
Or for ROCm:
docker build -f docker/Dockerfile-rocm -t milabench:rocm-nightly --build-arg CONFIG=standard.yaml .