Pytorch 按层调整学习率
第一步:先获取每一层的一个layer_scale
layer_scales = list(layer_decay ** (num_layers - i) for i in range(num_layers + 1))
第二步:把lr_scale载入到param_group中去:
def param_groups_lrd(model,.....)
layer_id = get_layer_id_for_vit(n, num_layers)
group_name = "layer_%d_%s" % (layer_id, g_decay)
param_group_names[group_name] = {
"lr_scale": this_scale,
"weight_decay": this_decay,
"params": [],
}
param_groups[group_name] = {
"lr_scale": this_scale,
"weight_decay": this_decay,
"params": [],
}
param_group_names[group_name]["params"].append(n)
param_groups[group_name]["params"].append(p)
return list(param_groups.values())
第三步:更新学习率
for param_group in optimizer.param_groups:
if "lr_scale" in param_group:
param_group["lr"] = lr * param_group["lr_scale"]
else:
param_group["lr"] = lr
第四步:把更新好的param_group传入到optimizer里面
########################################
#把上面更新好的param_group送给optimizer进行更新
param_groups = lrd.param_groups_lrd(model, args.weight_decay,layer_decay=args.layer_decay)
optimizer = torch.optim.AdamW(param_groups, lr=args.lr)
版权声明:本文为weixin_43118280原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。