pytorch 自定义层nn.Module

参考：https://blog.csdn.net/qq_27825451/article/details/90705328

pytorch里面一般是没有层的概念，层也是当成一个模型来处理的，这里和keras是不一样的。当然，我们也可以直接继承torch.autograd.Function类来自定义一个层，但是这很不推荐，不提倡，原因可以网上搜下。

记住一句话，keras更加注重的是层layer,pytorch更加注重的是模型Module.https://blog.csdn.net/qq_27825451/article/details/90705328

这里阐释下如何通过nn.Module类实现自定义层。

torch里面实现神经网络有两种方式

1）高层API方法：使用torch.nn.*来实现;

2）低层API方法，使用低层函数方法，torch.nn.functional.*来实现

https://blog.csdn.net/qq_27825451/article/details/90705328

class Conv2d(_ConvNd):
    def __init__(self, in_channels, out_channels, kernel_size, stride=1,
                 padding=0, dilation=1, groups=1,
                 bias=True, padding_mode='zeros'):
        kernel_size = _pair(kernel_size)
        stride = _pair(stride)
        padding = _pair(padding)
        dilation = _pair(dilation)
        super(Conv2d, self).__init__(
            in_channels, out_channels, kernel_size, stride, padding, dilation,
            False, _pair(0), groups, bias, padding_mode)
 
    @weak_script_method
    def forward(self, input):
        if self.padding_mode == 'circular':
            expanded_padding = ((self.padding[1] + 1) // 2, self.padding[1] // 2,
                                (self.padding[0] + 1) // 2, self.padding[0] // 2)
            return F.conv2d(F.pad(input, expanded_padding, mode='circular'),
                            self.weight, self.bias, self.stride,
                            _pair(0), self.dilation, self.groups)
        return F.conv2d(input, self.weight, self.bias, self.stride,
                        self.padding, self.dilation, self.groups)

其中，我们推荐使用高层API的方法，原因如下：

高层API是使用类的形式来包装的，既然是类就可以存储参数，比如全连接层的权值矩阵，偏置矩阵等都可以作为类的属性存储着，但是低层API仅仅是实现函数的运算功能，没办法保存这些信息，会丢失参数信息，但是高层API是依赖于低层API的计算函数的，比如上面的两个层

Conv2d高级层——>低层F.conv2d()函数

自定义层的步骤

要实现一个自定义层大致分以下几个主要的步骤：

1）自定义一个类，继承自Module类，并且一定要实现两个基本的函数，第一是构造函数_init_,第二个是层的逻辑运算函数，即所谓的前向计算函数forward函数。

2）在构造函数_init_中实现层的参数定义，比如linear层的的权重和偏置，Conv2d层的in_channels,out_channels,kernel_size,stride=1,padding=0,dilation=1,groups=1,bias=True,padding_mode='zeros'这一系列参数。

3）在前向传播forward函数里面实现前向运算。这一般都是通过torch.nn.functional.*函数来实现，当然很多时候我们也需要自定义自己的运算方式，如果该层含有权重，那么权重必须是nn.Parameter类型，关于tensor和variable与parameter的区别可网上搜下相关的文档，简单说就是parameter默认需要求导，其他两个类型则不会。另外一般情况下，可能的话，为自己定义的新层提供默认的参数初始化，以防使用过程中忘记初始化操作。（问题，若忘记了有啥后果？）

4）补充：一般情况下，我们定义的参数是可以求导的，但是自定义操作如不可导，需要实现backward函数

总结：这里其实和定义一个自定义模型是一样的，核心都是实现最基本的构造函数_init_和前向运算函数forward函数。

二、自定义层的简单例子

比如我要实现一个简单的层，这个层的功能是 $y=w*(x^2+bias)$ ,即输入x的平方再加上一个偏执项，再开根号，然后再乘以权值矩阵W，那要怎么做呢？按照上面的定义过程，我们先定一个这样的层（即一个类），代码如下：

# 定义一个 my_layer.py
import torch
 
class MyLayer(torch.nn.Module):
    '''
    因为这个层实现的功能是：y=weights*sqrt(x2+bias),所以有两个参数：
    权值矩阵weights
    偏置矩阵bias
    输入 x 的维度是（in_features,)
    输出 y 的维度是（out_features,) 故而
    bias 的维度是（in_fearures,)，注意这里为什么是in_features,而不是out_features，注意体会这里和Linear层的区别所在
    weights 的维度是（in_features, out_features）注意这里为什么是（in_features, out_features）,而不是（out_features, in_features），注意体会这里和Linear层的区别所在
    '''
    def __init__(self, in_features, out_features, bias=True):
        super(MyLayer, self).__init__()  # 和自定义模型一样，第一句话就是调用父类的构造函数
        self.in_features = in_features
        self.out_features = out_features
        self.weight = torch.nn.Parameter(torch.Tensor(in_features, out_features)) # 由于weights是可以训练的，所以使用Parameter来定义
        if bias:
            self.bias = torch.nn.Parameter(torch.Tensor(in_features))             # 由于bias是可以训练的，所以使用Parameter来定义
        else:
            self.register_parameter('bias', None)
 
    def forward(self, input):
        input_=torch.pow(input,2)+self.bias
        y=torch.matmul(input_,self.weight)
        return y

自定义模型并且训练

import torch
# from my_layer import MyLayer  # 自定义层

N, D_in, D_out = 10, 5, 3  # 一共10组样本，输入特征为5，输出特征为3

# 定义一个 my_layer.py
import torch


class MyLayer(torch.nn.Module):
    '''
    因为这个层实现的功能是：y=weights*sqrt(x2+bias),所以有两个参数：
    权值矩阵weights
    偏置矩阵bias
    输入 x 的维度是（in_features,)
    输出 y 的维度是（out_features,) 故而
    bias 的维度是（in_fearures,)，注意这里为什么是in_features,而不是out_features，注意体会这里和Linear层的区别所在
    weights 的维度是（in_features, out_features）注意这里为什么是（in_features, out_features）,而不是（out_features, in_features），注意体会这里和Linear层的区别所在
    '''

    def __init__(self, in_features, out_features, bias=True):
        super(MyLayer, self).__init__()  # 和自定义模型一样，第一句话就是调用父类的构造函数
        self.in_features = in_features
        self.out_features = out_features
        self.weight = torch.nn.Parameter(torch.Tensor(in_features, out_features))  # 由于weights是可以训练的，所以使用Parameter来定义
        if bias:
            self.bias = torch.nn.Parameter(torch.Tensor(in_features))  # 由于bias是可以训练的，所以使用Parameter来定义
        else:
            self.register_parameter('bias', None)

    def forward(self, input):
        input_ = torch.pow(input, 2) + self.bias
        y = torch.matmul(input_, self.weight)
        return y
# 先定义一个模型
class MyNet(torch.nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()  # 第一句话，调用父类的构造函数
        self.mylayer1 = MyLayer(D_in, D_out)

    def forward(self, x):
        x = self.mylayer1(x)

        return x


model = MyNet()
print(model)
'''运行结果为：
MyNet(
  (mylayer1): MyLayer()   # 这就是自己定义的一个层
)
'''
# 创建输入、输出数据
x = torch.randn(N, D_in)  # （10，5）
y = torch.randn(N, D_out)  # （10，3）

# 定义损失函数
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-4
# 构造一个optimizer对象
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

for t in range(10):  #

    # 第一步：数据的前向传播，计算预测值p_pred
    y_pred = model(x)

    # 第二步：计算计算预测值p_pred与真实值的误差
    loss = loss_fn(y_pred, y)
    print(loss.item())

    # 在反向传播之前，将模型的梯度归零，这
    optimizer.zero_grad()

    # 第三步：反向传播误差
    loss.backward()

    # 直接通过梯度一步到位，更新完整个网络的训练参数
    optimizer.step()

原文链接：https://blog.csdn.net/weixin_38145317/article/details/103908489