**
pytorch 初级爬坑–论pytorch 训练中死机
**
第一次跑pytorch,遇到了各种各样的问题,还浪费好几天停滞在同一个问题点上,不多说,切入重点。
- torch模型,跑机器学习异常现象:程序一跑,系统卡死,鼠标动弹不得,内存狂飙。
- torch模型,未到规定epoch,抛出错误:
Process finished with exit code 137 (interrupted by signal 9: SIGKILL)
,cpu torch 基本上是电脑内容问题。
总结:程序运行过程产生太多Variable,占用了太多的空间,不要轻易,不负责任地使用,append,必须要搞清楚,数据类型,及特点
出错code:
loss_records.append(G_loss_D)
loss_records.append(D_loss)
修改后code
loss_records.append(G_loss_D.item())
loss_records.append(D_loss.item())
loss 是Variable型变量,带来巨大的内存负担,原文引用:
I think I see the problem. You have to remember that loss is a Variable, and indexing Variables, always returns a Variable, even if they’re 1D! So when you do total_loss += loss[0] you’re actually making total_loss a Variable, and adding more and more subgraphs to its history, making it impossible to free them, because you’re still holding a reference. Just replace total_loss += loss[0] with total_loss += loss.data[0] and it should be back to normal.
版权声明:本文为weixin_42410103原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。