torch.backends(torch.backends.cudnn.allow_tf32)(torch.backends.cuda.matmul.allow_tf32)

从torch1.7起开始有torch.backends.cudnn.allow_tf32
在torch1.7-1.11默认是True, 在1.12及以后默认是False
作用是是否允许PyTorch在内部使用TensorFloat32（TF32）的 tensor core (在NVIDIA GPU的新的Ampere架构开始使用) 来计算matmul（矩阵乘法和分批矩阵乘法）和卷积。
TF32 tensor core的设计是为了在torch.float32张量上实现更好的matmul和卷积性能（它将输入数据四舍五入到有10比特的尾数，并以FP32精度累积结果，保持FP32动态范围）
在MGCA中就显式设为True
torch.backends.cuda.matmul.allow_tf32 = True
# The flag below controls whether to allow TF32 on cuDNN. This flag defaults to True.
torch.backends.cudnn.allow_tf32 = True
例子
a_full = torch.randn(10240, 10240, dtype=torch.double, device='cuda')
b_full = torch.randn(10240, 10240, dtype=torch.double, device='cuda')
ab_full = a_full @ b_full
mean = ab_full.abs().mean()  # 80.7277

a = a_full.float()
b = b_full.float()

# Do matmul at TF32 mode.
torch.backends.cuda.matmul.allow_tf32 = True
ab_tf32 = a @ b  # takes 0.016s on GA100
error = (ab_tf32 - ab_full).abs().max()  # 0.1747
relative_error = error / mean  # 0.0022

# Do matmul with TF32 disabled.
torch.backends.cuda.matmul.allow_tf32 = False
ab_fp32 = a @ b  # takes 0.11s on GA100
error = (ab_fp32 - ab_full).abs().max()  # 0.0031
relative_error = error / mean  # 0.000039
CUDA semantics — PyTorch 1.13 documentation

原文链接：https://blog.csdn.net/hxxjxw/article/details/129511236