基于 Pytorch 的模型量化实现

CNN网络结构虽然使得人工智能算法跨入一个新的台阶，但是其计算复杂度与其数据类型成为其于边缘设备上部署的一道门槛；为此，剪枝算法、蒸馏算法、轻量型网络、量化算法相应提出，本文将着重介绍模型量化算法pytorch实现。

量化算法涉及到的两个概念对称量化与非对称量化，主要在zero_point 上体现，具体介绍本文将不再赘述。当前常用的量化方法主要由后训练动态量化、后训练静态量化、量化感知训练，接下来将详细介绍后两种方法实现。

介绍前，需要注意的是Pytorch在1.3后开始对量化进行支持，并且对某些层并不支持，如BatchNorm、DeConv等，但可将Conv+BN合并，使用双线性插等上采样方法替代DeConv来解决。

1、后训练静态量化(Post-training static quantization)

#-*- coding:utf-8 -*-
import torch


...



#量化仅可用cpu
model = ResNet().cpu()
model = torch.load_state_dict(torch.load(weights))

#Specify quantization configuration
#在这一步声明了对称量化或非对称量化，及量化bit数
#如下代码中采用了默认的非对称量化，及8bit
model.qconfig = torch.quantization.default_qconfig
model = torch.quantization.prepare(model)

#Convert to quantized model
model = torch.quantization.convert(model)

#Save model, 保存后模型的size显著减小，但性能损失相对较大
#故，建议考虑量化感知训练
torch.save(model.state_dict(), "path.pt")

2、量化感知训练(Quantization-aware training)

#-*- coding:utf-8 -*-

import torch

...


class ResNet(torch.nn.Module):

    def __init__(self,):
        ...
        self.quant = torch.quantization.QuantStub()
        self.dequant = torch.quantization.DeQuantStub()

    def forward(self, x):
        x = self.quant(x)
        ...
        x = self.dequant(x)
        return x

if __name__ == "__main__":
    #量化只能在cpu上进行
    model = ResNet().cpu()
    model.load_state_dict(torch.load(weights))
    #'fbgemm'适用于x86
    model.qconfig = torch.quantization.get_default_qat_qconfig('fbgemm')
    model = torch.quantization.prepare_qat(model)
    
    for epoch in range(epoches):
        train(model, data)
        q_model = torch.quantization.convert(model.eval())
        torch.save(q_model.state_dict(), "path.pt")

以上就是全部关于量化的Pytorch代码实现。若想引用本文，请加上出处。

原文链接：https://blog.csdn.net/perfects110/article/details/108804622