pytorch 初级爬坑

**

pytorch 初级爬坑–论pytorch 训练中死机

**
第一次跑pytorch,遇到了各种各样的问题,还浪费好几天停滞在同一个问题点上,不多说,切入重点。

  1. torch模型,跑机器学习异常现象:程序一跑,系统卡死,鼠标动弹不得,内存狂飙。
  2. torch模型,未到规定epoch,抛出错误: Process finished with exit code 137 (interrupted by signal 9: SIGKILL) ,cpu torch 基本上是电脑内容问题。
    总结:程序运行过程产生太多Variable,占用了太多的空间,不要轻易,不负责任地使用,append,必须要搞清楚,数据类型,及特点

出错code:

loss_records.append(G_loss_D)
loss_records.append(D_loss)

修改后code

loss_records.append(G_loss_D.item())
loss_records.append(D_loss.item())

loss 是Variable型变量,带来巨大的内存负担,原文引用:
I think I see the problem. You have to remember that loss is a Variable, and indexing Variables, always returns a Variable, even if they’re 1D! So when you do total_loss += loss[0] you’re actually making total_loss a Variable, and adding more and more subgraphs to its history, making it impossible to free them, because you’re still holding a reference. Just replace total_loss += loss[0] with total_loss += loss.data[0] and it should be back to normal.


版权声明:本文为weixin_42410103原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。