7_pytorch_批数据训练

批数据训练

要点

Torch 中提供了一种帮你整理你的数据结构的好东西, 叫做 DataLoader, 我们能用它来包装自己的数据, 进行批训练. 而且批训练可以有很多种途径,

DataLoader

DataLoader 是 torch 给你用来包装你的数据的工具. 所以你要讲自己的 (numpy array 或其他) 数据形式装换成 Tensor, 然后再放进这个包装器中. 使用 DataLoader 有什么好处呢? 就是他们帮你有效地迭代数据

# 批处理
import torch
import torch.utils.data as Data
torch.manual_seed(1)    # reproducible

BATCH_SIZE = 5      # 批训练的数据个数

x = torch.linspace(1, 10, 10)       # x data (torch tensor)
y = torch.linspace(10, 1, 10)       # y data (torch tensor)

# 先转换成 torch 能识别的 Dataset（放到Torch能识别的数据库中）
torch_dataset = Data.TensorDataset(x, y)

# 把多线程要放在main函数下面
if __name__=='__main__':
    # 把 dataset 放入 DataLoader
    loader = Data.DataLoader(
        dataset=torch_dataset,      # torch TensorDataset format
        batch_size=BATCH_SIZE,      # mini batch size
        shuffle=True,               # 要不要打乱数据 (打乱比较好)
        num_workers=3,              # 多线程来读数据
    )

    for epoch in range(3):   # 训练所有!整套!数据 3 次
        for step, (batch_x, batch_y) in enumerate(loader):  # 每一步 loader 释放一小批数据用来学习
            # 假设这里就是你训练的地方...

            # 打出来一些数据
            print('Epoch: ', epoch, '| Step: ', step, '| batch x: ',
                  batch_x.numpy(), '| batch y: ', batch_y.numpy())
            
# 运行结果：
'''
Epoch:  0 | Step:  0 | batch x:  [ 5.  7. 10.  3.  4.] | batch y:  [6. 4. 1. 8. 7.]
Epoch:  0 | Step:  1 | batch x:  [2. 1. 8. 9. 6.] | batch y:  [ 9. 10.  3.  2.  5.]
Epoch:  1 | Step:  0 | batch x:  [ 4.  6.  7. 10.  8.] | batch y:  [7. 5. 4. 1. 3.]
Epoch:  1 | Step:  1 | batch x:  [5. 3. 2. 1. 9.] | batch y:  [ 6.  8.  9. 10.  2.]
Epoch:  2 | Step:  0 | batch x:  [ 4.  2.  5.  6. 10.] | batch y:  [7. 9. 6. 5. 1.]
Epoch:  2 | Step:  1 | batch x:  [3. 9. 1. 8. 7.] | batch y:  [ 8.  2. 10.  3.  4.]

'''

# 修改BATCH_SIZE = 8后：     # 批训练的数据个数
# 真正方便的还不是这点. 如果我们改变一下 BATCH_SIZE = 8, 这样我们就知道, step=0 会导出8个数据, 但是, step=1 时数据库中的数据不够 8个：

'''
Epoch:  0 | Step:  0 | batch x:  [ 5.  7. 10.  3.  4.  2.  1.  8.] | batch y:  [ 6.  4.  1.  8.  7.  9. 10.  3.]
Epoch:  0 | Step:  1 | batch x:  [9. 6.] | batch y:  [2. 5.]
Epoch:  1 | Step:  0 | batch x:  [ 4.  6.  7. 10.  8.  5.  3.  2.] | batch y:  [7. 5. 4. 1. 3. 6. 8. 9.]
Epoch:  1 | Step:  1 | batch x:  [1. 9.] | batch y:  [10.  2.]
Epoch:  2 | Step:  0 | batch x:  [ 4.  2.  5.  6. 10.  3.  9.  1.] | batch y:  [ 7.  9.  6.  5.  1.  8.  2. 10.]
Epoch:  2 | Step:  1 | batch x:  [8. 7.] | batch y:  [3. 4.]
'''

可以看出, 每步都导出了5个数据进行学习. 然后每个 epoch 的导出数据都是先打乱了以后再导出.

报错解决

1、报错TypeError: init() got an unexpected keyword argument ‘data_tensor’

解决：

把原代码中的torch_dataset = Data.TensorDataset(data_tensor=x, target_tensor=y)改成torch_dataset = Data.TensorDataset(x, y)就可以了

2、RuntimeError: An attempt has been made to start a new process before the current process…

解决方法1：

加main函数，在main中调用

解决方法2：

num_workers改为0，单进程加载

原文链接：https://blog.csdn.net/weixin_43871072/article/details/121681392