GPU内存泄漏排查：通过torch.cuda.memory_snapshot定位问题

在PyTorch深度学习项目中，GPU内存泄漏是常见但棘手的问题。本文通过torch.cuda.memory_snapshot()方法定位内存泄漏问题。

问题复现代码：

import torch
import torch.nn as nn

class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer = nn.Linear(1000, 1000)
    
    def forward(self, x):
        return self.layer(x)

model = SimpleModel().cuda()
for i in range(100):
    x = torch.randn(1000, 1000).cuda()
    output = model(x)
    # 忘记清理梯度
    del x, output

内存快照分析：

# 获取内存快照
snapshot = torch.cuda.memory_snapshot()

# 分析内存分配情况
for segment in snapshot['segments']:
    print(f"Segment: {segment['device']} {segment['address']}")
    for block in segment['blocks']:
        if block['size'] > 1024*1024:  # 大于1MB的块
            print(f"  Block: {block['size']} bytes")

性能数据：