Tutorial: Profiling a PyTorch Model#

In this tutorial, we will use Metinor to profile a PyTorch model. Profiling a model is important to understand the performance of the model and to identify bottlenecks. Metinor provides a simple API to profile a model and a set of tools to analyze and plot the results.

[1]:
# Suppress warnings (to make the output cleaner in this notebook)
import warnings

warnings.filterwarnings("ignore")

Create a model#

The profiler in Metinor supports subclasses of nn.Module. In this tutorial, we will use a pre-trained ResNet-18 from torchvision.

[2]:
from torchvision.models import resnet18, ResNet18_Weights

# Create the model to profile
model = resnet18(weights=ResNet18_Weights.DEFAULT)
input_shape = (3, 224, 224)

Import the profiler#

Next, we will import the profiler from the metinor.profiler package. This is available as metinor.profiler.Profiler class and several metinor.functional.profiler.profile_* functions. We will profile both the model types using the profiler class and the functions.

Profiling a model using the functional API#

[3]:
from nervai_optim.functional.profiler import profile_metrics

# Profile the model
model_report = profile_metrics(model, input_shape=input_shape)

# Report is a `metinor.profiler.ProfileReport` object which contains the following variables:
# - `report.summary`: A summary of the profiled model
# - `report.details`: A layer-by-layer breakdown of the profiled model
# - `report.pretty_summary`: A human-readable summary of the profiled model
# - `report.pretty_details`: A human-readable layer-by-layer breakdown of the profiled model
# Each of these variables is a `pandas.DataFrame` object

# Print the summary of the profiled model
model_report.pretty_summary
[3]:
ProfilerMetric.NAME ProfilerMetric.TYPE ProfilerMetric.INPUTS ProfilerMetric.OUTPUTS ProfilerMetric.PARAMS ProfilerMetric.PARAMS_TRAINABLE ProfilerMetric.FLOPS ProfilerMetric.MADDS ProfilerMetric.MEM_READ ProfilerMetric.MEM_WRITE ProfilerMetric.RUNTIME ProfilerMetric.RMS
0 ResNet ResNet (3, 224, 224) (1000,) 11.69M 11.69M 1.82G 3.64G 18.71M 6.72M 0.028 7.07
[4]:
# Show some details of first 10 layers
from nervai_optim.profiler.metrics import ProfilerMetric

columns = [
    ProfilerMetric.NAME,
    ProfilerMetric.TYPE,
    ProfilerMetric.FLOPS,
    ProfilerMetric.MADDS,
    ProfilerMetric.RUNTIME,
]
model_report.pretty_details[columns].head(10)
[4]:
ProfilerMetric.NAME ProfilerMetric.TYPE ProfilerMetric.FLOPS ProfilerMetric.MADDS ProfilerMetric.RUNTIME
1 conv1 Conv2d 118.01M 235.23M 0.0025
2 bn1 BatchNorm2d 1.61M 3.21M 0.0006
3 relu ReLU 802.82K 802.82K 0.00014
4 maxpool MaxPool2d 802.82K 1.61M 0.0013
5 layer1.0.conv1 Conv2d 115.61M 231.01M 0.0012
6 layer1.0.bn1 BatchNorm2d 401.41K 802.82K 0.00022
7 layer1.0.relu ReLU 200.7K 200.7K 1.7881393432617188e-05
8 layer1.0.conv2 Conv2d 115.61M 231.01M 0.0012
9 layer1.0.bn2 BatchNorm2d 401.41K 802.82K 0.00019
10 layer1.1.conv1 Conv2d 115.61M 231.01M 0.0008
[5]:
# More functional API examples
from nervai_optim.functional.profiler import (
    profile_flops,
    profile_runtime,
    profile_memory,
    profile_madds,
    profile_params,
)

# Profile the model
params = profile_params(model, input_shape=input_shape, readable=True)
flops = profile_flops(model, input_shape=input_shape, readable=True)
runtime = profile_runtime(model, input_shape=input_shape, readable=True)
memory = profile_memory(model, input_shape=input_shape, readable=True)
madds = profile_madds(model, input_shape=input_shape, readable=True)

print("ResNet-18 Profiling Results")
print("-------")
print(f"Params: {params}")
print(f"FLOPs: {flops}")
print(f"MAdds: {madds}")
print(f"Memory: {memory}")
print(f"Runtime: {runtime}")
ResNet-18 Profiling Results
-------
Params: 11.69M
FLOPs: 1.82G
MAdds: 3.64G
Memory: ('18.71M', '6.72M')
Runtime: 0.028

Profiling a model using the profiler class#

Next, we will profile the model using the profiler class.

[6]:
from nervai_optim.profiler import Profiler, ProfilerReport

# Create a profiler object and profile the model
profiler = Profiler(model, input_shape=input_shape)
profiled_nodes = profiler.profile()

# `profiled_nodes` is a list of `metinor.profiler.ProfiledNode` objects, which
# can be used to generate a report using `metinor.profiler.ProfilerReport` class
model_report = ProfilerReport(profiled_nodes)
model_report.pretty_summary
[6]:
ProfilerMetric.NAME ProfilerMetric.TYPE ProfilerMetric.INPUTS ProfilerMetric.OUTPUTS ProfilerMetric.PARAMS ProfilerMetric.PARAMS_TRAINABLE ProfilerMetric.FLOPS ProfilerMetric.MADDS ProfilerMetric.MEM_READ ProfilerMetric.MEM_WRITE ProfilerMetric.RUNTIME ProfilerMetric.RMS
0 ResNet ResNet (3, 224, 224) (1000,) 11.69M 11.69M 1.82G 3.64G 18.71M 6.72M 0.031 7.07
[7]:
model_report.pretty_details.head(15)
[7]:
ProfilerMetric.NAME ProfilerMetric.TYPE ProfilerMetric.INPUTS ProfilerMetric.OUTPUTS ProfilerMetric.PARAMS ProfilerMetric.PARAMS_TRAINABLE ProfilerMetric.FLOPS ProfilerMetric.MADDS ProfilerMetric.MEM_READ ProfilerMetric.MEM_WRITE ProfilerMetric.RUNTIME ProfilerMetric.RMS
1 conv1 Conv2d (3, 224, 224) (64, 112, 112) 9.41K 9.41K 118.01M 235.23M 159.94K 802.82K 0.0019 0.13
2 bn1 BatchNorm2d (64, 112, 112) (64, 112, 112) 128 128 1.61M 3.21M 802.94K 802.82K 0.0008 0.32
3 relu ReLU (64, 112, 112) (64, 112, 112) 0 0 802.82K 802.82K 802.82K 802.82K 0.00014 0.0
4 maxpool MaxPool2d (64, 112, 112) (64, 56, 56) 0 0 802.82K 1.61M 802.82K 200.7K 0.0015 0.0
5 layer1.0.conv1 Conv2d (64, 56, 56) (64, 56, 56) 36.86K 36.86K 115.61M 231.01M 237.57K 200.7K 0.0012 0.05
6 layer1.0.bn1 BatchNorm2d (64, 56, 56) (64, 56, 56) 128 128 401.41K 802.82K 200.83K 200.7K 0.00017 0.3
7 layer1.0.relu ReLU (64, 56, 56) (64, 56, 56) 0 0 200.7K 200.7K 200.7K 200.7K 1.9073486328125e-05 0.0
8 layer1.0.conv2 Conv2d (64, 56, 56) (64, 56, 56) 36.86K 36.86K 115.61M 231.01M 237.57K 200.7K 0.001 0.045
9 layer1.0.bn2 BatchNorm2d (64, 56, 56) (64, 56, 56) 128 128 401.41K 802.82K 200.83K 200.7K 0.00015 0.29
10 layer1.1.conv1 Conv2d (64, 56, 56) (64, 56, 56) 36.86K 36.86K 115.61M 231.01M 237.57K 200.7K 0.0009 0.05
11 layer1.1.bn1 BatchNorm2d (64, 56, 56) (64, 56, 56) 128 128 401.41K 802.82K 200.83K 200.7K 0.00028 0.27
12 layer1.1.relu ReLU (64, 56, 56) (64, 56, 56) 0 0 200.7K 200.7K 200.7K 200.7K 1.9311904907226562e-05 0.0
13 layer1.1.conv2 Conv2d (64, 56, 56) (64, 56, 56) 36.86K 36.86K 115.61M 231.01M 237.57K 200.7K 0.0011 0.044
14 layer1.1.bn2 BatchNorm2d (64, 56, 56) (64, 56, 56) 128 128 401.41K 802.82K 200.83K 200.7K 0.00016 0.32
15 layer2.0.conv1 Conv2d (64, 56, 56) (128, 28, 28) 73.73K 73.73K 57.8M 115.51M 274.43K 100.35K 0.0006 0.042

Analyzing the results#

In addition to the profiling results as a metinor.profiler.ProfilerReport object, Metinor also provides the metinor.profiler.analysis.ReportAnalyzer class to analyze the profiling results. This class provides several methods to search and filter the results. It also provides methods to plot the profiled metrics.

[8]:
from nervai_optim.profiler.analysis import ReportAnalyzer

# Create a report analyzer object
analyzer = ReportAnalyzer(model_report)

Find layers with the highest values of a metric#

The find_top_n_layers method can be used to find the top k layers based on a metric. It takes the metric name and the number of layers to return as arguments, and returns a pandas.DataFrame object with k highest values of the metric.

For example, in the code below, we will find the top 5 layers with the highest number of FLOPs.

[9]:
analyzer.find_top_n_layers(ProfilerMetric.PARAMS, n=5)
[9]:
ProfilerMetric.NAME ProfilerMetric.TYPE ProfilerMetric.INPUTS ProfilerMetric.OUTPUTS ProfilerMetric.PARAMS ProfilerMetric.PARAMS_TRAINABLE ProfilerMetric.FLOPS ProfilerMetric.MADDS ProfilerMetric.MEM_READ ProfilerMetric.MEM_WRITE ProfilerMetric.RUNTIME ProfilerMetric.RMS
42 layer4.0.conv2 Conv2d (512, 7, 7) (512, 7, 7) 2359296 2359296 115605504 231185920 2384384 25088 0.003270 tensor(0.0174)
46 layer4.1.conv1 Conv2d (512, 7, 7) (512, 7, 7) 2359296 2359296 115605504 231185920 2384384 25088 0.003787 tensor(0.0180)
49 layer4.1.conv2 Conv2d (512, 7, 7) (512, 7, 7) 2359296 2359296 115605504 231185920 2384384 25088 0.002408 tensor(0.0132)
39 layer4.0.conv1 Conv2d (256, 14, 14) (512, 7, 7) 1179648 1179648 57802752 115580416 1229824 25088 0.001255 tensor(0.0199)
30 layer3.0.conv2 Conv2d (256, 14, 14) (256, 14, 14) 589824 589824 115605504 231160832 640000 50176 0.001086 tensor(0.0251)

The find_bottom_n_layers method does the same thing but returns the bottom k layers based on a metric.

Determine relative impact of each layer on a metric#

The layer_wise_impact method returns contribution of each layer to a specific metric. It takes the metric name as an argument and returns a pandas.DataFrame object with the value of that metric and a percentage contribution of each layer to that metric.

[10]:
analyzer.layer_wise_impact(ProfilerMetric.RUNTIME).head(5)
[10]:
ProfilerMetric.NAME ProfilerMetric.TYPE ProfilerMetric.RUNTIME ProfilerMetric.RUNTIME (%)
46 layer4.1.conv1 Conv2d 0.003787 12.306214
42 layer4.0.conv2 Conv2d 0.003270 10.627319
49 layer4.1.conv2 Conv2d 0.002408 7.825804
1 conv1 Conv2d 0.001853 6.021399
4 maxpool MaxPool2d 0.001475 4.791862

Filter layers by type#

The filter_by_layer_type method can be used to filter the layers by their type. It takes the layer type as an argument and returns a pandas.DataFrame object with the layers of the specified type. Note that all analyzer methods also take a keyword argument readable which can be set to True to return metrics in a human-readable format.

[11]:
analyzer.find_layers_by_type("Conv2d", readable=True)
[11]:
ProfilerMetric.NAME ProfilerMetric.TYPE ProfilerMetric.INPUTS ProfilerMetric.OUTPUTS ProfilerMetric.PARAMS ProfilerMetric.PARAMS_TRAINABLE ProfilerMetric.FLOPS ProfilerMetric.MADDS ProfilerMetric.MEM_READ ProfilerMetric.MEM_WRITE ProfilerMetric.RUNTIME ProfilerMetric.RMS
1 conv1 Conv2d (3, 224, 224) (64, 112, 112) 9.41K 9.41K 118.01M 235.23M 159.94K 802.82K 0.0019 tensor(0.1297)
5 layer1.0.conv1 Conv2d (64, 56, 56) (64, 56, 56) 36.86K 36.86K 115.61M 231.01M 237.57K 200.7K 0.0012 tensor(0.0535)
8 layer1.0.conv2 Conv2d (64, 56, 56) (64, 56, 56) 36.86K 36.86K 115.61M 231.01M 237.57K 200.7K 0.001 tensor(0.0452)
10 layer1.1.conv1 Conv2d (64, 56, 56) (64, 56, 56) 36.86K 36.86K 115.61M 231.01M 237.57K 200.7K 0.0009 tensor(0.0509)
13 layer1.1.conv2 Conv2d (64, 56, 56) (64, 56, 56) 36.86K 36.86K 115.61M 231.01M 237.57K 200.7K 0.0011 tensor(0.0440)
15 layer2.0.conv1 Conv2d (64, 56, 56) (128, 28, 28) 73.73K 73.73K 57.8M 115.51M 274.43K 100.35K 0.0006 tensor(0.0416)
18 layer2.0.conv2 Conv2d (128, 28, 28) (128, 28, 28) 147.46K 147.46K 115.61M 231.11M 247.81K 100.35K 0.001 tensor(0.0340)
20 layer2.0.downsample.0 Conv2d (64, 56, 56) (128, 28, 28) 8.19K 8.19K 6.42M 12.74M 208.9K 100.35K 0.00038 tensor(0.0707)
22 layer2.1.conv1 Conv2d (128, 28, 28) (128, 28, 28) 147.46K 147.46K 115.61M 231.11M 247.81K 100.35K 0.0009 tensor(0.0342)
25 layer2.1.conv2 Conv2d (128, 28, 28) (128, 28, 28) 147.46K 147.46K 115.61M 231.11M 247.81K 100.35K 0.001 tensor(0.0301)
27 layer3.0.conv1 Conv2d (128, 28, 28) (256, 14, 14) 294.91K 294.91K 57.8M 115.56M 395.26K 50.18K 0.0007 tensor(0.0291)
30 layer3.0.conv2 Conv2d (256, 14, 14) (256, 14, 14) 589.82K 589.82K 115.61M 231.16M 640.0K 50.18K 0.0011 tensor(0.0251)
32 layer3.0.downsample.0 Conv2d (128, 28, 28) (256, 14, 14) 32.77K 32.77K 6.42M 12.79M 133.12K 50.18K 0.00032 tensor(0.0330)
34 layer3.1.conv1 Conv2d (256, 14, 14) (256, 14, 14) 589.82K 589.82K 115.61M 231.16M 640.0K 50.18K 0.0012 tensor(0.0224)
37 layer3.1.conv2 Conv2d (256, 14, 14) (256, 14, 14) 589.82K 589.82K 115.61M 231.16M 640.0K 50.18K 0.001 tensor(0.0207)
39 layer4.0.conv1 Conv2d (256, 14, 14) (512, 7, 7) 1.18M 1.18M 57.8M 115.58M 1.23M 25.09K 0.0013 tensor(0.0199)
42 layer4.0.conv2 Conv2d (512, 7, 7) (512, 7, 7) 2.36M 2.36M 115.61M 231.19M 2.38M 25.09K 0.0033 tensor(0.0174)
44 layer4.0.downsample.0 Conv2d (256, 14, 14) (512, 7, 7) 131.07K 131.07K 6.42M 12.82M 181.25K 25.09K 0.0007 tensor(0.0328)
46 layer4.1.conv1 Conv2d (512, 7, 7) (512, 7, 7) 2.36M 2.36M 115.61M 231.19M 2.38M 25.09K 0.0038 tensor(0.0180)
49 layer4.1.conv2 Conv2d (512, 7, 7) (512, 7, 7) 2.36M 2.36M 115.61M 231.19M 2.38M 25.09K 0.0024 tensor(0.0132)

Analyze memory usage#

The analyze_memory_usage method can be used to analyze the memory usage of the model. It returns a pandas.DataFrame object with the memory usage of each layer.

[12]:
analyzer.analyze_memory_usage(readable=True).head(
    5
)  # only first 5 layers (for brevity)
[12]:
ProfilerMetric.NAME ProfilerMetric.MEM_READ ProfilerMetric.MEM_WRITE
1 conv1 159.94K 802.82K
2 bn1 802.94K 802.82K
3 relu 802.82K 802.82K
4 maxpool 802.82K 200.7K
5 layer1.0.conv1 237.57K 200.7K

Plotting the results#

This section is under construction

[13]:
figsize = (10, 5)
analyzer.plot_top_n_layers(n=15, figsize=figsize, colormap="rainbow").show()
analyzer.plot_total_params_per_layer(
    figsize=figsize, colormap="rainbow", limit=10
).show()
analyzer.plot_memory_usage(figsize=figsize, colormap="rainbow").show()
analyzer.plot_layer_wise_impact(
    ProfilerMetric.RUNTIME, figsize=figsize, colormap="tab20"
).show()
analyzer.plot_layers_by_type(
    "Conv2d", metric=ProfilerMetric.FLOPS, figsize=figsize, colormap="rainbow"
).show()
../../_images/nervai-optim_profiling_tutorial_1_22_0.png
../../_images/nervai-optim_profiling_tutorial_1_22_1.png
../../_images/nervai-optim_profiling_tutorial_1_22_2.png
../../_images/nervai-optim_profiling_tutorial_1_22_3.png
../../_images/nervai-optim_profiling_tutorial_1_22_4.png