Welcome to milabench’s documentation!
Install and use
Note
You may use Docker to run the benchmarks, which will likely be easier. See the Docker section of this documentation for more information.
To install, clone the repo:
# You may need to upgrade pip
pip install pip -U
git clone git@github.com:mila-iqia/milabench.git
cd milabench
# <Activate virtual environment>
# Install in editable mode
pip install -e .
This will install two commands, milabench
and voir
.
Before running the benchmarks
Set the
$MILABENCH_BASE
environment variable to the base directory in which all the code, virtual environments and data should be put.Set the
$MILABENCH_CONFIG
environment variable to the configuration file that represents the benchmark suite you want to run. Normally it should be set toconfig/standard.yaml
.milabench install
: Install the individual benchmarks in virtual environments.milabench prepare
: Download the datasets, weights, etc.
If the machine has both NVIDIA/CUDA and AMD/ROCm GPUs, you may have to set the MILABENCH_GPU_ARCH
environment variable as well, to either cuda
or rocm
.
Run milabench
The following command will run the whole benchmark and will put the results in a new directory in $MILABENCH_BASE/runs
(the path will be printed to stdout).
milabench run
Here are a few useful options for milabench run
:
# Only run the bert benchmark
milabench run --select bert
# Run all benchmarks EXCEPT bert and stargan
milabench run --exclude bert,stargan
# Run the benchmark suite three times in a row
milabench run --repeat 3
Reports
The following command will print out a report of the tests that ran, the metrics and if there were any failures. It will also produce an HTML report that contains more detailed information about errors if there are any.
milabench report --runs $MILABENCH_BASE/runs/some_specific_run --html report.html
The report will also print out a score based on a weighting of the metrics, as defined in the file $MILABENCH_CONFIG
points to.
Docker
Docker Images are created for each release. They come with all the benchmarks installed and the necessary datasets. No additional downloads are necessary.
CUDA
Requirements
NVIDIA driver
Usage
The commands below will download the lastest cuda container and run milabench right away,
storing the results inside the results
folder on the host machine:
# Choose the image you want to use
export MILABENCH_IMAGE=ghcr.io/mila-iqia/milabench:cuda-nightly
# Pull the image we are going to run
docker pull $MILABENCH_IMAGE
# Run milabench
docker run -it --rm --ipc=host --gpus=all \
-v $(pwd)/results:/milabench/envs/runs \
$MILABENCH_IMAGE \
milabench run
--ipc=host
removes shared memory restrictions, but you can also set --shm-size
to a high value instead (at least 8G
, possibly more).
Each run should store results in a unique directory under results/
on the host machine. To generate a readable report of the results you can run:
# Show Performance Report
docker run -it --rm \
-v $(pwd)/results:/milabench/envs/runs \
$MILABENCH_IMAGE \
milabench report --runs /milabench/envs/runs
ROCM
Requirements
rocm
docker
Usage
For ROCM the usage is similar to CUDA, but you must use a different image and the Docker options are a bit different:
# Choose the image you want to use
export MILABENCH_IMAGE=ghcr.io/mila-iqia/milabench:rocm-nightly
# Pull the image we are going to run
docker pull $MILABENCH_IMAGE
# Run milabench
docker run -it --rm --ipc=host \
--device=/dev/kfd --device=/dev/dri \
--security-opt seccomp=unconfined --group-add video \
-v /opt/amdgpu/share/libdrm/amdgpu.ids:/opt/amdgpu/share/libdrm/amdgpu.ids \
-v /opt/rocm:/opt/rocm \
-v $(pwd)/results:/milabench/envs/runs \
$MILABENCH_IMAGE \
milabench run
For the performance report, it is the same command:
# Show Performance Report
docker run -it --rm \
-v $(pwd)/results:/milabench/envs/runs \
$MILABENCH_IMAGE \
milabench report --runs /milabench/envs/runs
Multi-node benchmark
There are currently two multi-node benchmarks, opt-1_3b-multinode
(data-parallel) and opt-6_7b-multinode
(model-parallel, that model is too large to fit on a single GPU). Here is how to run them:
Set up two or more machines that can see each other on the network. Suppose there are two and their addresses are:
manager-node
⬅ this is the node you will launch the job on
worker-node
docker pull
the image on both nodes.Prior to running the benchmark, create a SSH key pair on
manager-node
and set up public key authentication to the other nodes (in this case,worker-node
).Write an override file that will tell milabench about the network (see below). Note that you will need to copy/paste the same configuration for both multinode tests.
On
manager-node
, executemilabench run
via Docker.
Mount the private key at
/milabench/id_milabench
in the containerUse
--override "$(cat overrides.yaml)"
to pass the overrides
Example YAML configuration (overrides.yaml
):
# Name of the benchmark. You can also override values in other benchmarks.
opt-6_7b-multinode:
# Docker image to use on the worker nodes (should be same as the manager)
docker_image: "ghcr.io/mila-iqia/milabench:cuda-nightly"
# The user on worker-node that public key auth is set up for
worker_user: "username"
# Address of the manager node from the worker nodes
manager_addr: "manager-node"
# Addresses of the worker nodes (do not include the manager node,
# although it is also technically a worker node)
worker_addrs:
- "worker-node"
# Make sure that this is equal to length(worker_addrs) + 1
num_machines: 2
capabilities:
# Make sure that this is ALSO equal to length(worker_addrs) + 1
nodes: 2
opt-1_3b-multinode:
# Copy the contents of the opt-6_7b-multinode section without any changes.
docker_image: "ghcr.io/mila-iqia/milabench:cuda-nightly"
worker_user: "username"
manager_addr: "manager-node"
worker_addrs:
- "worker-node"
num_machines: 2
capabilities:
nodes: 2
Then, the command should look like this:
# On manager-node:
# Change if needed
export SSH_KEY_FILE=$HOME/.ssh/id_rsa
docker run -it --rm --gpus all --network host --ipc=host --privileged \
-v $SSH_KEY_FILE:/milabench/id_milabench \
-v $(pwd)/results:/milabench/envs/runs \
$MILABENCH_IMAGE \
milabench run --override "$(cat overrides.yaml)" \
--select multinode
The last line (--select multinode
) specifically selects the multi-node benchmarks. Omit that line to run all benchmarks.
If you need to use more than two nodes, edit or copy overrides.yaml
and simply add the other nodes’ addresses in worker_addrs
and adjust num_machines
and capabilities.nodes
accordingly. For example, for 4 nodes:
opt-6_7b-multinode:
docker_image: "ghcr.io/mila-iqia/milabench:cuda-nightly"
worker_user: "username"
manager_addr: "manager-node"
worker_addrs:
- "worker-node1"
- "worker-node2"
- "worker-node3"
num_machines: 4
capabilities:
nodes: 4
Note
The multi-node benchmark is sensitive to network performance. If the mono-node benchmark opt-6_7b
is significantly faster than opt-6_7b-multinode
(e.g. processes more than twice the items per second), this likely indicates that Infiniband is either not present or not used. (It is not abnormal for the multinode benchmark to perform a bit worse than the mono-node benchmark since it has not been optimized to minimize the impact of communication costs.)
Even if Infiniband is properly configured, the benchmark may fail to use it unless the --privileged
flag is set when running the container.
Building images
Images can be built locally for prototyping and testing.
docker build -f docker/Dockerfile-cuda -t milabench:cuda-nightly --build-arg CONFIG=standard.yaml .
Or for ROCm:
docker build -f docker/Dockerfile-rocm -t milabench:rocm-nightly --build-arg CONFIG=standard.yaml .
Using milabench (DEVELOPERS)
To use milabench
, you need:
A YAML configuration file to define the benchmarks to install, prepare or run.
The base directory for code, virtual environments, data and outputs, set either with the
$MILABENCH_BASE
environment variable or the--base
option. The base directory will be automatically constructed by milabench and will be organized as follows:
$MILABENCH_BASE/
|- venv/ # Virtual environments and dependencies
| |- bench1/ # venv for benchmark bench1
| |- ... # etc
|- code/ # Benchmark code
| |- bench1/ # Code for benchmark bench1
| |- ... # etc
|- data/ # Datasets
| |- dataset1/ # A dataset
| |- ... # etc
|- runs/ # Outputs of benchmark runs
|- calimero.2022-03-30_15:00:00/ # Auto-generated run name
| |- bench1.0.stdout # Output for the first run of bench1
| |- bench1.0.stderr # Stderr for the first run of bench1
| |- bench1.0.data # Structured data for the first run of bench1
| |- bench1.1.stdout # Output for the second run of bench1
| |- ... # etc
|- blah/ # Can set name with --run
It is possible to change the structure in the YAML to e.g. force benchmarks to all use the same virtual environment.
Important options
Use the
--select
option with a comma-separated list of benchmarks in order to only install/prepare/run these benchmarks (or use--exclude
to run all benchmarks except a specific set).You may use
--use-current-env
to force the use the currently active virtual environment.
milabench install
milabench install --config config/standard.yaml --select mybench
Installs the benchmark specified in the
definition
field of the benchmark’s YAML, relative to the YAML file itself.Creates/reuses a virtual environment in
$MILABENCH_BASE/venv/mybench
(unlessinstall_group
is set to something different) and installs all pip dependencies in it.
milabench prepare
milabench prepare --config config/standard.yaml --select mybench
Prepares data for the benchmark into
$MILABENCH_BASE/data/dataset_name
. Multiple benchmarks can share the same data. Some benchmarks need no preparation, so the prepare step does nothing.May also download model weights or preprocess data.
milabench run
milabench run --config config/standard.yaml --select mybench
Creates a certain number of tasks from the benchmark using the
plan
defined in the YAML. For instance, one plan might be to run it in parallel on each GPU on the machine.The benchmark is run from that directory using a command like
voir [VOIR_OPTIONS] main.py [SCRIPT_OPTIONS]
* Both option groups are defined in the YAML. * The VOIR_OPTIONS determine/tweak which instruments to use and what data to forward to milabench. * The SCRIPT_OPTIONS are benchmark dependent.Standard output/error and other data (training rates, etc.) are forwarded to the main dispatcher process and saved into
$MILABENCH_BASE/runs/run_name/mybench.run_number.stdout
(.stderr
/.data
) (the name of the directory is printed out for easy reference).
milabench pin
milabench pin --config config/standard.yaml --select mybench --variant cuda
The basic idea behind milabench pin
is to pin software versions for stability and reproducibility. Using the command above, the base requirements in benchmarks/mybench/requirements.in
will be saved in requirements.cuda.txt
. If variant is not specified, the value of install_variant
in the config file will be used (in standard.yaml
, which is install_value: "{{arch}}"
; that resolves to either “rocm” or “cuda” depending on the machine’s architecture).
For a given variant, the installation is also constrained by constraints/variant.txt
, if the file exists. The file specifies appropriate constraints for the architecture, CUDA version, or other constraints that are specific to the environment.
You can add more constraints with --constraints path/to/constraints.txt
.
milabench report
TODO.
milabench report --config config/standard.yaml --runs <path_to_runs>
milabench compare
TODO.
Creating a new benchmark
To define a new benchmark (let’s assume it is called ornatebench
), make a copy of benchmarks/_template
using cp-template
:
cp-template benchmarks/_template/ benchmarks/ornatebench
You should see a directory with the following structure:
ornatebench
|- README.md # Document the benchmark here
|- benchfile.py # Benchmark definition file
|- main.py # Executed by milabench run
|- prepare.py # Executed by milabench prepare (EXECUTABLE)
|- requirements.in # Python requirements to install from pip
|- voirfile.py # Probes and extra instruments
Some of these files may be unnecessary depending on the benchmark.
First of all, if you want to verify that everything works, you can use the dev.yaml
benchmark config that comes with the template:
# You can also use --config
export MILABENCH_CONFIG=benchmarks/ornatebench/dev.yaml
milabench install
milabench prepare
milabench run
Overview
benchfile.py
benchfile.py
defines what to do on milabench install/prepare/run
. It is run from the benchmark directory directly, in the current virtual environment, but it can create new processes in the virtual environment of the benchmark.
By default it will dispatch to requirements.in
for install requirements, prepare.py
for prep work and downloading datasets, and main.py
for running the actual benchmark. If that is suitable you may not need to change it at all.
requirements.in
Write all of the benchmark’s requirements in this file. Use milabench install --config benchmarks/ornatebench/dev.yaml
to install them during development (add --force
if you made changes and want to reinstall.)
prepare.py
This script is executed in the venv for the benchmark when you run milabench prepare
.
The purpose of prepare.py
is to download and/or generate everything that is required by the main script, so that the main script does not need to use the network and can start training right away. In particular, it must:
Download any datasets required by the main script into
$MILABENCH_DATA_DIR
.Preprocess the data, if this must be done prior to training.
Generate synthetic data into
$MILABENCH_DATA_DIR
(if needed).Download and cache pretrained model weights (if needed). * Weights should ideally go somewhere under
$XDG_CACHE_HOME
(which milabench sets to$MILABENCH_BASE/cache
). * Note that most frameworks already cache weights in subdirectories of$XDG_CACHE_HOME
, so it is usually sufficient to import the framework, load the model, and then quit without training it.
If no preparation is needed, this file should be removed.
main.py
This is the main script that will be benchmarked when you run milabench run
. It is run as voir main.py ARGV...
The template main.py
demonstrates a simple loop that you can adapt to any script:
def main():
for i in voir.iterate("train", range(100), report_batch=True, batch_size=64):
give(loss=1/(i + 1))
time.sleep(0.1)
Wrap the training loop with
voir.iterate
. *report_batch=True
triggers the computation of the number of training samples per second. * Setbatch_size
to the batch_size. milabench can also figure it out automatically if you are iterating over the input batches (it will use the first number in the tensor’s shape).give(loss=loss.item())
will forward the value of the loss to milabench. Make sure the value is a plain Pythonfloat
.
If the script takes command line arguments, you can parse them however you like, for example with argparse.ArgumentParser
. Then, you can add an argv
section in dev.yaml
, just like this:
trivial:
inherits: _defaults
definition: .
...
# Pass arguments to main.py below
argv:
--batch-size: 64
argv
can also be an array if you need to pass positional arguments, but I recommend using named parameters only.
voirfile.py
The voirfile contains instrumentation for the main script. You can usually just leave it as it is. By default, it will:
Compute the train “rate” (number of samples per second) using events from
voir.iterate
.Forcefully stop the program after a certain number of rate measurements.
Monitor GPU usage.
Development
To develop the benchmark, first run milabench dev --config benchmarks/BENCHNAME/dev.yaml
. This will activate the benchmark’s virtual environment and put you into a shell.
Then, try and run voir --dash main.py
. This should show you a little dashboard and display losses, train rate calculations and one or more progress bars.
From there, you can develop as you would any other Python program.
Integrating in base.yaml
You can copy-paste the contents of dev.yaml
into config/base.yaml
, you will only need to change:
definition
should be the relative path to thebenchfile.py
.Remove
install_variant: unpinned
If the benchmark’s requirements are compatible with those of other benchmarks, you can set
install_group
to the sameinstall_group
as them. For example,install_group: torch
.
Then, run the following commands:
milabench pin --select NAME_OR_INSTALL_GROUP --variant cuda
milabench pin --select NAME_OR_INSTALL_GROUP --variant rocm
This will create requirements.<arch>.txt
for these two architectures. These files must be checked in under version control.
Note
--variant unpinned
means installing directly from requirements.in
. This can be useful during development, but less stable over time since various dependencies may break.