PyTorch模型性能调优工具使用指南

最近在优化一个ResNet50模型时踩了不少坑，分享几个实用的PyTorch性能调优工具。

1. torch.profiler.profile

import torch
from torch.profiler import profile, record_function

def model_forward():
    with profile(activities=[torch.profiler.ProfilerActivity.CPU, torch.profiler.ProfilerActivity.CUDA],
                 record_shapes=True) as prof:
        with record_function("model_inference"):
            output = model(input_tensor)
print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=10))

2. torch.utils.benchmark

from torch.utils.benchmark import Timer

timer = Timer(
    stmt='output = model(input_tensor)',
    setup='from __main__ import model, input_tensor',
    num_runs=100
)
print(timer.timeit(1))

3. NVIDIA Apex混合精度训练

from apex import amp
model, optimizer = amp.initialize(model, optimizer, opt_level="O1")
# 训练循环中正常运行即可

实测在V100上，使用上述工具后推理速度提升约35%，训练时间减少28%。建议先用profiler定位瓶颈，再针对性优化。

PyTorch模型性能调优工具使用指南

PyTorch模型性能调优工具使用指南

1. torch.profiler.profile

2. torch.utils.benchmark

3. NVIDIA Apex混合精度训练

讨论

选择表情