How to Monitor and Manage NVIDIA GPUs on Ubuntu Using nvidia-smi

1. Introduction

When utilizing a GPU on Ubuntu, it is essential to accurately monitor its status. This is particularly important in workloads involving deep learning or graphic rendering, where understanding GPU usage and driver versions is mandatory. In this article, we explain how to use nvidia-smi, an NVIDIA GPU management tool, and introduce methods for checking GPU information on Ubuntu.

2. Checking GPU Information with nvidia-smi

nvidia-smi is a command-line tool that enables you to monitor utilization, memory usage, and other detailed information about NVIDIA GPUs. It is especially useful when checking GPU activity in real time or retrieving detailed utilization metrics.

Basic Usage

You can use the following command to monitor GPU usage and memory utilization in real time:

nvidia-smi --query-gpu=timestamp,name,utilization.gpu,utilization.memory,memory.used,memory.free --format=csv -l 1


This command retrieves detailed information including GPU utilization, memory usage, and available memory. The -l option enables you to set the refresh interval in seconds.

Display Format and File Output

By default, results are displayed in a table format, but you can output them as CSV for easier processing. If you want to save the information to a file, specify the destination using the -f option.

nvidia-smi --query-gpu=timestamp,name,utilization.gpu,utilization.memory,memory.used,memory.free --format=csv -l 1 -f /path/to/output.csv

This allows you to save GPU utilization logs and analyze them later.

3. Retrieving Process Information with nvidia-smi

nvidia-smi also enables you to retrieve information about processes currently using the GPU. This helps identify how much GPU resources each process consumes.

Getting Process Information

Use the following command to view the PID and memory usage of processes utilizing the GPU:

nvidia-smi --query-compute-apps=pid,process_name,used_memory --format=csv,noheader

This command returns a list of active GPU processes and displays the memory usage of each process.

nvidia-smi pmon Subcommand

The nvidia-smi tool also includes a pmon subcommand, which provides detailed GPU process monitoring.

nvidia-smi pmon --delay 10 -s u -o DT

This displays GPU process information at specified intervals. The --delay option defines the refresh interval in seconds, and you can choose what information to display.

4. Installing and Verifying NVIDIA Drivers

To use an NVIDIA GPU on Ubuntu, the correct NVIDIA driver must be installed. Below are the steps for installation and verification.

Driver Installation

First, install the appropriate NVIDIA driver for your system using the following command:

sudo apt install nvidia-driver-510

After installation is complete, restart your system.

Verifying the Installation

After rebooting, run the following command to confirm that the driver is properly installed:

nvidia-smi

If the driver version and CUDA version appear, the installation was successful.

5. Verifying GPU Operation with TensorFlow

You can also verify GPU functionality by testing it with TensorFlow, a machine learning framework.

Installing Anaconda

First, install Anaconda and set up your environment:

bash ./Anaconda3-2022.05-Linux-x86_64.sh
conda update -n base conda
conda update anaconda
conda update -y --all
conda install tensorflow-gpu==2.4.1

Checking GPU Recognition in TensorFlow

Next, verify whether TensorFlow recognizes the GPU:

from tensorflow.python.client import device_lib
device_lib.list_local_devices()

If the GPU device appears in the list, TensorFlow is successfully detecting the GPU.

6. GPU Monitoring and Logging

nvidia-smi enables real-time GPU monitoring and log recording. This helps track GPU usage over time and optimize performance.

Periodic Monitoring

To set periodic monitoring, use the -l option to specify the update interval, and optionally log data to a file:

nvidia-smi --query-gpu=timestamp,name,utilization.gpu,utilization.memory,memory.used,memory.free --format=csv -l 1 -f /var/log/gpu.log

Programmable Control via Python Bindings

nvidia-smi provides Python bindings (nvidia-ml-py) that allow you to programmatically retrieve GPU information. This enables more customized monitoring and control from Python scripts.

7. Conclusion

nvidia-smi is a powerful tool for checking and managing NVIDIA GPU usage on Ubuntu. This article explained basic usage, process monitoring, driver installation, and TensorFlow GPU verification. Use these techniques to maximize GPU performance and optimize your system.

Related Articles

年収訴求