torch.backends(torch.backends.cudnn.allow_tf32)(torch.backends.cuda.matmul.allow_tf32)

从torch1.7起开始有torch.backends.cudnn.allow_tf32

在torch1.7-1.11默认是True, 在1.12及以后默认是False

作用是是否允许PyTorch在内部使用TensorFloat32(TF32)的 tensor core (在NVIDIA GPU的新的Ampere架构开始使用) 来计算matmul(矩阵乘法和分批矩阵乘法)和卷积。

TF32 tensor core的设计是为了在torch.float32张量上实现更好的matmul和卷积性能(它将输入数据四舍五入到有10比特的尾数,并以FP32精度累积结果,保持FP32动态范围)

在MGCA中就显式设为True

torch.backends.cuda.matmul.allow_tf32 = True
# The flag below controls whether to allow TF32 on cuDNN. This flag defaults to True.
torch.backends.cudnn.allow_tf32 = True

例子

a_full = torch.randn(10240, 10240, dtype=torch.double, device='cuda')
b_full = torch.randn(10240, 10240, dtype=torch.double, device='cuda')
ab_full = a_full @ b_full
mean = ab_full.abs().mean()  # 80.7277

a = a_full.float()
b = b_full.float()

# Do matmul at TF32 mode.
torch.backends.cuda.matmul.allow_tf32 = True
ab_tf32 = a @ b  # takes 0.016s on GA100
error = (ab_tf32 - ab_full).abs().max()  # 0.1747
relative_error = error / mean  # 0.0022

# Do matmul with TF32 disabled.
torch.backends.cuda.matmul.allow_tf32 = False
ab_fp32 = a @ b  # takes 0.11s on GA100
error = (ab_fp32 - ab_full).abs().max()  # 0.0031
relative_error = error / mean  # 0.000039

CUDA semantics — PyTorch 1.13 documentation


版权声明:本文为hxxjxw原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。