CVPR2023-三维空间体素占用预测挑战赛-尝试（1）

1. 三维体素占用预测挑战赛（3D Occupancy Prediction）
- 1.1 挑战赛简介
- 1.2 数据说明
2. 测试
参考资料

前言：CVPR是IEEE Conference on Computer Vision and Pattern Recognition的缩写，即IEEE国际计算机视觉与模式识别会议。该会议是由IEEE举办的计算机视觉和模式识别领域的顶级会议。本博客针对CVPR2023挑战赛的三维空间体素占用预测任务，开展了探索性测试，以帮助希望深入了解和深度参与这项比赛的选手提供一些基础性踩坑探索???

1. 三维体素占用预测挑战赛（3D Occupancy Prediction）

1.1 挑战赛简介

了解包括背景成分和前景物体对象在内的 3D 环境对于自动驾驶非常重要。在传统的 3D 物体检测任务中，前景物体由 3D 边界框表示。但是物体的几何形状比较复杂，不能用简单的3D边界框来表示，并且缺乏对背景内容的感知。此任务的目标是预测场景的 3D体素占用。在此挑战赛任务中，我们提供了一个基于 nuScenes 数据集的大规模占用基准。基准是 3D 空间的体素化表示，并且在此任务中联合估计 3D 空间中体素的占用状态和语义。该任务的复杂性在于在给定环视图像的情况下对 3D 空间进行密集预测。
在这里插入图片描述

1.2 数据说明

目前开放的数据包括两个部分,mini和tranval数据包含三部分 -imgs、-gts和annotations
在这里插入图片描述
数据整体的路径树如下所示：

├── mini
    |
    ├── trainval
    |   ├── imgs
    |   |   ├── CAM_BACK
    |   |   |   ├── n015-2018-07-18-11-07-57+0800__CAM_BACK__1531883530437525.jpg
    |   |   |   └── ...
    |   |   ├── CAM_BACK_LEFT
    |   |   |   ├── n015-2018-07-18-11-07-57+0800__CAM_BACK_LEFT__1531883530447423.jpg
    |   |   |   └── ...
    |   |   └── ...
    |   |     
    |   ├── gts  
    |   |   ├── [scene_name]
    |   |   |   ├── [frame_token]
    |   |   |   |   └── labels.npz
    |   |   |   └── ...
    |   |   └── ...
    |   |
    |   └── annotations.json
    |
    └── test
        ├── imgs
        └── annotations.json

各个文件夹的内容介绍如下：

imgs/
包含由各种相机拍摄的图像。
gts/
包含每个样本的基本事实。[scene_name]指定帧序列，并[frame_token]指定序列中的单个帧。
annotations.json
包含数据集的元信息。
labels.npz
包含每一帧的[semantics], [mask_lidar], 和[mask_camera]

2. 测试

2.1 注册与数据集下载

先注册个Eval AI 账号，模型效果评测需要用。记得邮箱验证下账号。
网址：EvalAI注册

在这里插入图片描述
下载后，可以用这个代码看看数据的路径树

import os
def print_folder_tree(path, parent_is_last=1, depth_limit=2, tab_width=1):
    """
    以树状打印输出文件夹下的文件, 并返回文件夹内的所有文件
    :param tab_width: 空格宽度
    :param path: 文件夹路径
    :param depth_limit: 要输出文件夹的层数, -1为输出全部文件及文件夹
    :param parent_is_last: 递归调用上级文件夹是否是最后一个文件(夹), 控制输出 │ 树干
    :return: 返回path下的所有文件的数组
    """
    files = []
    if len(str(parent_is_last)) - 1 == depth_limit:
        return files
    items = os.listdir(path)
    for index, i in enumerate(items):
        is_last = index == len(items) - 1
        i_path = path + "/" + i
        for k in str(parent_is_last)[1:]:
            if k == "0":
                print("│" + "\t" * tab_width, end="")
            if k == "1":
                print("\t" * tab_width, end="")
        if is_last:
            print("└── ", end="")
        else:
            print("├── ", end="")
        if os.path.isdir(i_path):
            print(i)
            files.extend(print_folder_tree(
                path=i_path, depth_limit=depth_limit, parent_is_last=(parent_is_last * 10 + 1) if is_last else (parent_is_last * 10)))
        else:
            print(i_path.split("/")[-1])
            files.append(i_path)
    return files

mini数据集的路径树如下：

├── annotations.json
├── gts
│	├── scene-0061
│	├── scene-0103
│	├── scene-0553
│	├── scene-0655
│	├── scene-0757
│	├── scene-0796
│	├── scene-0916
│	├── scene-1077
│	├── scene-1094
│	└── scene-1100
├── gts.tar.gz
├── imgs
│	├── CAM_BACK
│	├── CAM_BACK_LEFT
│	├── CAM_BACK_RIGHT
│	├── CAM_FRONT
│	├── CAM_FRONT_LEFT
│	└── CAM_FRONT_RIGHT
└── imgs.tar.gz
['E:\\CVPR2023_datasets\\Occupancy3D-nuScenes-mini/annotations.json', 'E:\\CVPR2023_datasets\\Occupancy3D-nuScenes-mini/gts.tar.gz', 'E:\\CVPR2023_datasets\\Occupancy3D-nuScenes-mini/imgs.tar.gz']

2.2 基于BEVFormer的基线模型测试

BEVFormer通过预定义的网格状BEV查询，将时间和空间进行交互，从而挖掘空间和时间信息。为了聚合空间信息，设计了一个空间交叉注意（ spatial cross-attention），每个BEV查询都从摄像机视图的感兴趣区域提取空间特征。对于时间信息，提出了一种时间自我注意（ temporal self-attention），以反复融合历史BEV信息。在nuScenes数据集上，NDS评估值指标达到了SOTA : 56.9%，比之前基于激光雷达的SOTA方法性能高9个点。我们进一步表明，BEVFormer显著提高了低能见度条件下目标速度估计和调用的精度。

环境配置与模型下载
具体环境配置可以参考这个，建议新建环境，要不然可能不匹配
预训练模型下载可以直接网页下载,也可以命令行实现

wget https://github.com/zhiqi-li/storage/releases/download/v1.0/r101_dcn_fcos3d_pretrain.pth

错误解决

 warnings.warn(f'Error checking compiler version for {compiler}: {error}')
      building 'mmcv._ext' extension
      error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/

需要安装更高版本的VC++

...
byte-compiling build\bdist.win-amd64\egg\mmdet3d\ops\spconv\overwrite_spconv\__init__.py to __init__.cpython-38.pyc
byte-compiling build\bdist.win-amd64\egg\mmdet3d\ops\spconv\__init__.py to __init__.cpython-38.pyc
byte-compiling build\bdist.win-amd64\egg\mmdet3d\ops\__init__.py to __init__.cpython-38.pyc
byte-compiling build\bdist.win-amd64\egg\mmdet3d\utils\collect_env.py to collect_env.cpython-38.pyc
byte-compiling build\bdist.win-amd64\egg\mmdet3d\utils\compat_cfg.py to compat_cfg.cpython-38.pyc
byte-compiling build\bdist.win-amd64\egg\mmdet3d\utils\logger.py to logger.cpython-38.pyc
byte-compiling build\bdist.win-amd64\egg\mmdet3d\utils\misc.py to misc.cpython-38.pyc
byte-compiling build\bdist.win-amd64\egg\mmdet3d\utils\setup_env.py to setup_env.cpython-38.pyc
byte-compiling build\bdist.win-amd64\egg\mmdet3d\utils\__init__.py to __init__.cpython-38.pyc
byte-compiling build\bdist.win-amd64\egg\mmdet3d\version.py to version.cpython-38.pyc
byte-compiling build\bdist.win-amd64\egg\mmdet3d\__init__.py to __init__.cpython-38.pyc
creating build\bdist.win-amd64\egg\EGG-INFO
copying mmdet3d.egg-info\PKG-INFO -> build\bdist.win-amd64\egg\EGG-INFO
copying mmdet3d.egg-info\SOURCES.txt -> build\bdist.win-amd64\egg\EGG-INFO
copying mmdet3d.egg-info\dependency_links.txt -> build\bdist.win-amd64\egg\EGG-INFO
copying mmdet3d.egg-info\not-zip-safe -> build\bdist.win-amd64\egg\EGG-INFO
copying mmdet3d.egg-info\requires.txt -> build\bdist.win-amd64\egg\EGG-INFO
copying mmdet3d.egg-info\top_level.txt -> build\bdist.win-amd64\egg\EGG-INFO
creating dist
...

模型训练

./tools/dist_train.sh ./projects/configs/bevformer/bevformer_tiny.py 1

【2】这部分后面才发现对windows用户不太友好，后续有时间再测试…

相关论文
英文论文链接
 中文论文链接
 代码链接

2.3 模型识别结果可视化

参考资料

【1】3D Occupancy Prediction 挑战赛 Github官方网站
【2】BEVFormer Github

原文链接：https://blog.csdn.net/u013537270/article/details/129705194