Measuring GPU Energy: Best Practices - ML.ENERGY Blog (2024)

To optimize something, you need to be able to measure it right.In this post, we'll look into potential pitfalls and best practices for GPU energy measurement.

Best practices

1. Actually measure energy

We sometimes see energy consumption or carbon emission being estimated with back-of-the-envelope calculations using the GPUs' Thermal Design Power (TDP), or in other words their maximum power consumption.1This is of course fine if you're trying to raise awareness and motivate others to start looking at energy consumption.However, if we want to really observe what happens to energy when we change around parameters and optimize it, we must actually measure energy consumption.

TDP is usually not the best proxy for GPU power consumption.Below is the average power consumption of one NVIDIA V100 GPU while training different models:

Measuring GPU Energy: Best Practices - ML.ENERGY Blog (1) Measuring GPU Energy: Best Practices - ML.ENERGY Blog (2)

You can see that depending on the computation characteristic and load of models, average power consumption varies significantly, and never really touches the GPU's TDP.

What about for larger models?Below we measured the average power consumption of four NVIDIA A40 GPUs training larger models with four-stage pipeline parallelism:

Measuring GPU Energy: Best Practices - ML.ENERGY Blog (3) Measuring GPU Energy: Best Practices - ML.ENERGY Blog (4)

It's the same again.Even for computation-heavy large model training, GPU average power consumption does not reach TDP.

2. Use the most efficient API

How do you measure GPU energy?Depending on the microarchitecture of your GPU, there can be either one way or two ways.Below, we show the sample code using Python bindings for NVML.

import pynvmlpynvml.nvmlInit()handle = pynvml.nvmlDeviceGetHandleByIndex(0) # First GPUpower = pynvml.nvmlDeviceGetPowerUsage(handle) # milliwatts

The power API returns the current power consumption of the GPU.Since energy is power integrated over time, you will need a separate thread to poll the power API, and later integrate the power samples over time using sklearn.metrics.auc, for example.

import pynvmlpynvml.nvmlInit()handle = pynvml.nvmlDeviceGetHandleByIndex(0) # First GPUenergy = pynvml.nvmlDeviceGetTotalEnergyConsumption(handle) # millijoules

The energy API returns the total energy consumption of the GPU since the driver was last loaded.Therefore, you just need to call the energy API once before computing and once after computing, and subtract the two for the energy consumption between the two calls.

The power API, although supported by all microarchitectures, requires polling and then one discrete integration across time to compute energy.While polling happens in just one CPU core, it will be kept at high utilization during training and will consume some amount of extra energy purely for energy monitoring.On the other hand, the energy API simply requires two function calls in the main thread and one subtraction.

Takeaway

Use nvmlDeviceGetTotalEnergyConsumption if your GPU is Volta or newer. Otherwise, you'll need to poll nvmlDeviceGetPowerUsage and integrate power measurements across time to obtain energy.

3. Synchronize CPU and GPU

In most DNN training frameworks, the CPU dispatches CUDA kernels to the GPU in an asynchronous fashion.In other words, the CPU tells the GPU to do some work and moves on to the next line of Python code without waiting for the GPU to complete.Consider this:

import timeimport torchstart_time = time.time()train_one_step()elapsed_time = time.time() - start_time # Wrong!

The time measured is likely an underestimation of GPU computation time, as the CPU code is running ahead of GPU code and the GPU may not have finished executing all the work ordered by the CPU.Therefore, there are APIs in PyTorch2 and JAX3 that synchronize CPU and GPU execution, i.e., have the CPU wait for the GPU is done:

import timeimport torchstart_time = time.time()train_one_step()torch.cuda.synchronize() # Synchronizes CPU and GPU time.elapsed_time = time.time() - start_time

The same is the case for measuring GPU power or energy; NVML code that runs on your CPU must be synchronized with GPU execution for the measurement to be accurate:

import timeimport torchimport pynvmlstart_time = time.time()start_energy = pynvml.nvmlDeviceGetTotalEnergyConsumption(handle)train_one_step()torch.cuda.synchronize() # Synchronizes CPU and GPU time.elapsed_time = time.time() - start_timeconsumed_energy = \ pynvml.nvmlDeviceGetTotalEnergyConsumption(handle) - start_energy

Takeaway

To accurately measure GPU time and energy consumption, make the CPU wait for GPU work to complete. For example, in PyTorch, use torch.cuda.synchronize.

The all-in-one solution: ZeusMonitor

Do all these feel like a headache?Well, ZeusMonitor got you covered.Simple.It implements all three best practices and provides a simple interface:

from zeus.monitor import ZeusMonitormonitor = ZeusMonitor(gpu_indices=[0, 2]) # Arbitrary GPU indices. # Respects `CUDA_VISIBLE_DEVICES`.monitor.begin_window("training")# Train!measurement = monitor.end_window("training", sync_cuda=True)print(f"Time elapsed: {measurement.time}s")print(f"GPU0 energy consumed: {measurement.energy[0]}J")print(f"GPU2 energy consumed: {measurement.energy[2]}J")print(f"Total energy consumed: {measurement.total_energy}J")

ZeusMonitor will automatically detect your GPU architecture (separately for each GPU index you with to monitor) and use the right NVML API to measure GPU time and energy consumption.You can have multiple overlapping measurement windows as long as you choose different names for them.

Sounds good?Get started with Zeus here!

Measuring GPU Energy: Best Practices - ML.ENERGY Blog (2024)
Top Articles
Latest Posts
Article information

Author: Aron Pacocha

Last Updated:

Views: 5521

Rating: 4.8 / 5 (68 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Aron Pacocha

Birthday: 1999-08-12

Address: 3808 Moen Corner, Gorczanyport, FL 67364-2074

Phone: +393457723392

Job: Retail Consultant

Hobby: Jewelry making, Cooking, Gaming, Reading, Juggling, Cabaret, Origami

Introduction: My name is Aron Pacocha, I am a happy, tasty, innocent, proud, talented, courageous, magnificent person who loves writing and wants to share my knowledge and understanding with you.