Welcome to milabench’s documentation!

Install and use

Note

You may use Docker to run the benchmarks, which will likely be easier. See the Docker section of this documentation for more information.

To install, clone the repo:

# You may need to upgrade pip
pip install pip -U
git clone git@github.com:mila-iqia/milabench.git
cd milabench
# <Activate virtual environment>
# Install in editable mode
pip install -e .

This will install two commands, milabench and voir.

Before running the benchmarks

  1. Set the $MILABENCH_BASE environment variable to the base directory in which all the code, virtual environments and data should be put.

  2. Set the $MILABENCH_CONFIG environment variable to the configuration file that represents the benchmark suite you want to run. Normally it should be set to config/standard.yaml.

  3. milabench install: Install the individual benchmarks in virtual environments.

  4. milabench prepare: Download the datasets, weights, etc.

If the machine has both NVIDIA/CUDA and AMD/ROCm GPUs, you may have to set the MILABENCH_GPU_ARCH environment variable as well, to either cuda or rocm.

Run milabench

The following command will run the whole benchmark and will put the results in a new directory in $MILABENCH_BASE/runs (the path will be printed to stdout).

milabench run

Here are a few useful options for milabench run:

# Only run the bert benchmark
milabench run --select bert

# Run all benchmarks EXCEPT bert and stargan
milabench run --exclude bert,stargan

# Run the benchmark suite three times in a row
milabench run --repeat 3

Reports

The following command will print out a report of the tests that ran, the metrics and if there were any failures. It will also produce an HTML report that contains more detailed information about errors if there are any.

milabench report --runs $MILABENCH_BASE/runs/some_specific_run --html report.html

The report will also print out a score based on a weighting of the metrics, as defined in the file $MILABENCH_CONFIG points to.

Docker

Docker Images are created for each release. They come with all the benchmarks installed and the necessary datasets. No additional downloads are necessary.

CUDA

Requirements

Usage

The commands below will download the lastest cuda container and run milabench right away, storing the results inside the results folder on the host machine:

# Choose the image you want to use
export MILABENCH_IMAGE=ghcr.io/mila-iqia/milabench:cuda-nightly

# Pull the image we are going to run
docker pull $MILABENCH_IMAGE

# Run milabench
docker run -it --rm --ipc=host --gpus=all      \
      -v $(pwd)/results:/milabench/envs/runs   \
      $MILABENCH_IMAGE                         \
      milabench run

--ipc=host removes shared memory restrictions, but you can also set --shm-size to a high value instead (at least 8G, possibly more).

Each run should store results in a unique directory under results/ on the host machine. To generate a readable report of the results you can run:

# Show Performance Report
docker run -it --rm                             \
      -v $(pwd)/results:/milabench/envs/runs    \
      $MILABENCH_IMAGE                          \
      milabench report --runs /milabench/envs/runs

ROCM

Requirements

  • rocm

  • docker

Usage

For ROCM the usage is similar to CUDA, but you must use a different image and the Docker options are a bit different:

# Choose the image you want to use
export MILABENCH_IMAGE=ghcr.io/mila-iqia/milabench:rocm-nightly

# Pull the image we are going to run
docker pull $MILABENCH_IMAGE

# Run milabench
docker run -it --rm  --ipc=host                                                  \
      --device=/dev/kfd --device=/dev/dri                                        \
      --security-opt seccomp=unconfined --group-add video                        \
      -v /opt/amdgpu/share/libdrm/amdgpu.ids:/opt/amdgpu/share/libdrm/amdgpu.ids \
      -v /opt/rocm:/opt/rocm                                                     \
      -v $(pwd)/results:/milabench/envs/runs                                     \
      $MILABENCH_IMAGE                                                           \
      milabench run

For the performance report, it is the same command:

# Show Performance Report
docker run -it --rm                             \
      -v $(pwd)/results:/milabench/envs/runs    \
      $MILABENCH_IMAGE                          \
      milabench report --runs /milabench/envs/runs

Multi-node benchmark

There are currently two multi-node benchmarks, opt-1_3b-multinode (data-parallel) and opt-6_7b-multinode (model-parallel, that model is too large to fit on a single GPU). Here is how to run them:

  1. Set up two or more machines that can see each other on the network. Suppose there are two and their addresses are:

  • manager-node ⬅ this is the node you will launch the job on

  • worker-node

  1. docker pull the image on both nodes.

  2. Prior to running the benchmark, create a SSH key pair on manager-node and set up public key authentication to the other nodes (in this case, worker-node).

  3. Write an override file that will tell milabench about the network (see below). Note that you will need to copy/paste the same configuration for both multinode tests.

  4. On manager-node, execute milabench run via Docker.

  • Mount the private key at /milabench/id_milabench in the container

  • Use --override "$(cat overrides.yaml)" to pass the overrides

Example YAML configuration (overrides.yaml):

# Name of the benchmark. You can also override values in other benchmarks.
opt-6_7b-multinode:

  # Docker image to use on the worker nodes (should be same as the manager)
  docker_image: "ghcr.io/mila-iqia/milabench:cuda-nightly"

  # The user on worker-node that public key auth is set up for
  worker_user: "username"

  # Address of the manager node from the worker nodes
  manager_addr: "manager-node"

  # Addresses of the worker nodes (do not include the manager node,
  # although it is also technically a worker node)
  worker_addrs:
    - "worker-node"

  # Make sure that this is equal to length(worker_addrs) + 1
  num_machines: 2

  capabilities:
    # Make sure that this is ALSO equal to length(worker_addrs) + 1
    nodes: 2

opt-1_3b-multinode:
  # Copy the contents of the opt-6_7b-multinode section without any changes.
  docker_image: "ghcr.io/mila-iqia/milabench:cuda-nightly"
  worker_user: "username"
  manager_addr: "manager-node"
  worker_addrs:
    - "worker-node"
  num_machines: 2
  capabilities:
    nodes: 2

Then, the command should look like this:

# On manager-node:

# Change if needed
export SSH_KEY_FILE=$HOME/.ssh/id_rsa

docker run -it --rm --gpus all --network host --ipc=host --privileged \
  -v $SSH_KEY_FILE:/milabench/id_milabench \
  -v $(pwd)/results:/milabench/envs/runs \
  $MILABENCH_IMAGE \
  milabench run --override "$(cat overrides.yaml)" \
  --select multinode

The last line (--select multinode) specifically selects the multi-node benchmarks. Omit that line to run all benchmarks.

If you need to use more than two nodes, edit or copy overrides.yaml and simply add the other nodes’ addresses in worker_addrs and adjust num_machines and capabilities.nodes accordingly. For example, for 4 nodes:

opt-6_7b-multinode:
  docker_image: "ghcr.io/mila-iqia/milabench:cuda-nightly"
  worker_user: "username"
  manager_addr: "manager-node"
  worker_addrs:
    - "worker-node1"
    - "worker-node2"
    - "worker-node3"
  num_machines: 4
  capabilities:
    nodes: 4

Note

The multi-node benchmark is sensitive to network performance. If the mono-node benchmark opt-6_7b is significantly faster than opt-6_7b-multinode (e.g. processes more than twice the items per second), this likely indicates that Infiniband is either not present or not used. (It is not abnormal for the multinode benchmark to perform a bit worse than the mono-node benchmark since it has not been optimized to minimize the impact of communication costs.)

Even if Infiniband is properly configured, the benchmark may fail to use it unless the --privileged flag is set when running the container.

Building images

Images can be built locally for prototyping and testing.

docker build -f docker/Dockerfile-cuda -t milabench:cuda-nightly --build-arg CONFIG=standard.yaml .

Or for ROCm:

docker build -f docker/Dockerfile-rocm -t milabench:rocm-nightly --build-arg CONFIG=standard.yaml .

Using milabench (DEVELOPERS)

To use milabench, you need:

  • A YAML configuration file to define the benchmarks to install, prepare or run.

  • The base directory for code, virtual environments, data and outputs, set either with the $MILABENCH_BASE environment variable or the --base option. The base directory will be automatically constructed by milabench and will be organized as follows:

$MILABENCH_BASE/
|- venv/                            # Virtual environments and dependencies
|  |- bench1/                       # venv for benchmark bench1
|  |- ...                           # etc
|- code/                            # Benchmark code
|  |- bench1/                       # Code for benchmark bench1
|  |- ...                           # etc
|- data/                            # Datasets
|  |- dataset1/                     # A dataset
|  |- ...                           # etc
|- runs/                            # Outputs of benchmark runs
   |- calimero.2022-03-30_15:00:00/ # Auto-generated run name
   |  |- bench1.0.stdout            # Output for the first run of bench1
   |  |- bench1.0.stderr            # Stderr for the first run of bench1
   |  |- bench1.0.data              # Structured data for the first run of bench1
   |  |- bench1.1.stdout            # Output for the second run of bench1
   |  |- ...                        # etc
   |- blah/                         # Can set name with --run

It is possible to change the structure in the YAML to e.g. force benchmarks to all use the same virtual environment.

Important options

  • Use the --select option with a comma-separated list of benchmarks in order to only install/prepare/run these benchmarks (or use --exclude to run all benchmarks except a specific set).

  • You may use --use-current-env to force the use the currently active virtual environment.

milabench install

milabench install --config config/standard.yaml --select mybench
  • Installs the benchmark specified in the definition field of the benchmark’s YAML, relative to the YAML file itself.

  • Creates/reuses a virtual environment in $MILABENCH_BASE/venv/mybench (unless install_group is set to something different) and installs all pip dependencies in it.

milabench prepare

milabench prepare --config config/standard.yaml --select mybench
  • Prepares data for the benchmark into $MILABENCH_BASE/data/dataset_name. Multiple benchmarks can share the same data. Some benchmarks need no preparation, so the prepare step does nothing.

  • May also download model weights or preprocess data.

milabench run

milabench run --config config/standard.yaml --select mybench
  • Creates a certain number of tasks from the benchmark using the plan defined in the YAML. For instance, one plan might be to run it in parallel on each GPU on the machine.

  • The benchmark is run from that directory using a command like voir [VOIR_OPTIONS] main.py [SCRIPT_OPTIONS] * Both option groups are defined in the YAML. * The VOIR_OPTIONS determine/tweak which instruments to use and what data to forward to milabench. * The SCRIPT_OPTIONS are benchmark dependent.

  • Standard output/error and other data (training rates, etc.) are forwarded to the main dispatcher process and saved into $MILABENCH_BASE/runs/run_name/mybench.run_number.stdout (.stderr / .data) (the name of the directory is printed out for easy reference).

milabench pin

milabench pin --config config/standard.yaml --select mybench --variant cuda

The basic idea behind milabench pin is to pin software versions for stability and reproducibility. Using the command above, the base requirements in benchmarks/mybench/requirements.in will be saved in requirements.cuda.txt. If variant is not specified, the value of install_variant in the config file will be used (in standard.yaml, which is install_value: "{{arch}}"; that resolves to either “rocm” or “cuda” depending on the machine’s architecture).

For a given variant, the installation is also constrained by constraints/variant.txt, if the file exists. The file specifies appropriate constraints for the architecture, CUDA version, or other constraints that are specific to the environment.

You can add more constraints with --constraints path/to/constraints.txt.

milabench report

TODO.

milabench report --config config/standard.yaml --runs <path_to_runs>

milabench compare

TODO.

Creating a new benchmark

To define a new benchmark (let’s assume it is called ornatebench), make a copy of benchmarks/_template using cp-template:

cp-template benchmarks/_template/ benchmarks/ornatebench

You should see a directory with the following structure:

ornatebench
|- README.md          # Document the benchmark here
|- benchfile.py       # Benchmark definition file
|- main.py            # Executed by milabench run
|- prepare.py         # Executed by milabench prepare (EXECUTABLE)
|- requirements.in    # Python requirements to install from pip
|- voirfile.py        # Probes and extra instruments

Some of these files may be unnecessary depending on the benchmark.

First of all, if you want to verify that everything works, you can use the dev.yaml benchmark config that comes with the template:

# You can also use --config
export MILABENCH_CONFIG=benchmarks/ornatebench/dev.yaml

milabench install
milabench prepare
milabench run

Overview

benchfile.py

benchfile.py defines what to do on milabench install/prepare/run. It is run from the benchmark directory directly, in the current virtual environment, but it can create new processes in the virtual environment of the benchmark.

By default it will dispatch to requirements.in for install requirements, prepare.py for prep work and downloading datasets, and main.py for running the actual benchmark. If that is suitable you may not need to change it at all.

requirements.in

Write all of the benchmark’s requirements in this file. Use milabench install --config benchmarks/ornatebench/dev.yaml to install them during development (add --force if you made changes and want to reinstall.)

prepare.py

This script is executed in the venv for the benchmark when you run milabench prepare.

The purpose of prepare.py is to download and/or generate everything that is required by the main script, so that the main script does not need to use the network and can start training right away. In particular, it must:

  • Download any datasets required by the main script into $MILABENCH_DATA_DIR.

  • Preprocess the data, if this must be done prior to training.

  • Generate synthetic data into $MILABENCH_DATA_DIR (if needed).

  • Download and cache pretrained model weights (if needed). * Weights should ideally go somewhere under $XDG_CACHE_HOME (which milabench sets to $MILABENCH_BASE/cache). * Note that most frameworks already cache weights in subdirectories of $XDG_CACHE_HOME, so it is usually sufficient to import the framework, load the model, and then quit without training it.

If no preparation is needed, this file should be removed.

main.py

This is the main script that will be benchmarked when you run milabench run. It is run as voir main.py ARGV...

The template main.py demonstrates a simple loop that you can adapt to any script:

def main():
    for i in voir.iterate("train", range(100), report_batch=True, batch_size=64):
        give(loss=1/(i + 1))
        time.sleep(0.1)
  • Wrap the training loop with voir.iterate. * report_batch=True triggers the computation of the number of training samples per second. * Set batch_size to the batch_size. milabench can also figure it out automatically if you are iterating over the input batches (it will use the first number in the tensor’s shape).

  • give(loss=loss.item()) will forward the value of the loss to milabench. Make sure the value is a plain Python float.

If the script takes command line arguments, you can parse them however you like, for example with argparse.ArgumentParser. Then, you can add an argv section in dev.yaml, just like this:

trivial:
  inherits: _defaults
  definition: .

  ...

  # Pass arguments to main.py below
  argv:
    --batch-size: 64

argv can also be an array if you need to pass positional arguments, but I recommend using named parameters only.

voirfile.py

The voirfile contains instrumentation for the main script. You can usually just leave it as it is. By default, it will:

  • Compute the train “rate” (number of samples per second) using events from voir.iterate.

  • Forcefully stop the program after a certain number of rate measurements.

  • Monitor GPU usage.

Development

To develop the benchmark, first run milabench dev --config benchmarks/BENCHNAME/dev.yaml. This will activate the benchmark’s virtual environment and put you into a shell.

Then, try and run voir --dash main.py. This should show you a little dashboard and display losses, train rate calculations and one or more progress bars.

From there, you can develop as you would any other Python program.

Integrating in base.yaml

You can copy-paste the contents of dev.yaml into config/base.yaml, you will only need to change:

  • definition should be the relative path to the benchfile.py.

  • Remove install_variant: unpinned

  • If the benchmark’s requirements are compatible with those of other benchmarks, you can set install_group to the same install_group as them. For example, install_group: torch.

Then, run the following commands:

  • milabench pin --select NAME_OR_INSTALL_GROUP --variant cuda

  • milabench pin --select NAME_OR_INSTALL_GROUP --variant rocm

This will create requirements.<arch>.txt for these two architectures. These files must be checked in under version control.

Note

--variant unpinned means installing directly from requirements.in. This can be useful during development, but less stable over time since various dependencies may break.

Reference

milabench.pack

Indices and tables