一、项目地址：

https://github.com/ultralytics/yolov5https://github.com/ultralytics/yolov5

数据集完全可以沿用VOCDevkit格式数据集

二、VOCDevkit格式数据

|--VOC2007

|---Annotations

|---ImageSets

|----Layout

|----Main

|---test.txt

|---train.txt

|---trainval.txt

|---val.txt

|----Segmentation

|---JPEGImages

|---labels

然后这个数据集的配置里面有点坑：

【1】在项目根目录创建一个文件夹，叫啥都行，例如VOCData

【2】将上面的ImagesSets、JPEGImages、labels这几个文件夹先复制过来，现在就有了分割子集、图片、yolo格式标注数据

【3】用这个小脚本实现上面需要的东西（需要你有图像+xml原始标注文件+分类名字就行了）

三、数据集划分工具链

第一个：

file name: 1_take_label.py

import os

import random

 

trainval_percent = 1

train_percent = 0.9

xmlfilepath = 'Annotations这个文件夹的路径'

txtsavepath = '/ImageSets/Main这个文件夹的路径'

# 意思是从标记文件统计你的文件名字，然后划分到main里面去

total_xml = os.listdir(xmlfilepath)

 

num=len(total_xml)

list=range(num)

tv=int(num*trainval_percent)

tr=int(tv*train_percent)

trainval= random.sample(list,tv)

train=random.sample(trainval,tr)

 

ftrainval = open(txtsavepath+'/trainval.txt', 'w')

ftest = open(txtsavepath+'/test.txt', 'w')

ftrain = open(txtsavepath+'/train.txt', 'w')

fval = open(txtsavepath+'/val.txt', 'w')

 

for i  in list:

    name=total_xml[i][:-4]+'\n'

    if i in trainval:

        ftrainval.write(name)

        if i in train:

            ftrain.write(name)

        else:

            fval.write(name)

    else:

        ftest.write(name)

 

ftrainval.close()

ftrain.close()

fval.close()

ftest.close()

第二个：

file name: 2_voc_label.py

import xml.etree.ElementTree as ET

import pickle

import os

from os import listdir, getcwd

from os.path import join

import cv2

sets=[('2007', 'train'), ('2007', 'val'), ('2007', 'test')]



classes = [“person”,"ball"之类的这种分类名字写进来]





def convert(size, box):

    dw = 1./(size[0])

    dh = 1./(size[1])

    x = (box[0] + box[1])/2.0 - 1

    y = (box[2] + box[3])/2.0 - 1

    w = box[1] - box[0]

    h = box[3] - box[2]

    x = x*dw

    w = w*dw

    y = y*dh

    h = h*dh

    return (x,y,w,h)



def convert_annotation(year, image_id):

    in_file = open('VOCdevkit/VOC%s/Annotations/%s.xml'%(year, image_id))

    out_file = open('VOCdevkit/VOC%s/labels/%s.txt'%(year, image_id), 'w')

    tree=ET.parse(in_file)

    root = tree.getroot()

    size = root.find('size')

    jpg_file = cv2.imread("./VOCdevkit/VOC2007/JPEGImages/"+image_id+".jpg")

    h = jpg_file.shape[0]

    w = jpg_file.shape[1]

    # w = int(size.find('width').text)

    # h = int(size.find('height').text)



    for obj in root.iter('object'):

        difficult = obj.find('difficult').text

        cls = obj.find('name').text

        if cls not in classes or int(difficult)==1:

            continue

        cls_id = classes.index(cls)

        xmlbox = obj.find('bndbox')

        # print(xmlbox.text)

        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))

        bb = convert((w,h), b)

        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')



wd = getcwd()



for year, image_set in sets:

    if not os.path.exists('VOCdevkit/VOC%s/labels/'%(year)):

        os.makedirs('VOCdevkit/VOC%s/labels/'%(year))

    image_ids = open('VOCdevkit/VOC%s/ImageSets/Main/%s.txt'%(year, image_set)).read().strip().split()

    list_file = open('%s_%s.txt'%(year, image_set), 'w')

    for image_id in image_ids:

        # print(image_id)

        list_file.write('%s/VOCdevkit/VOC%s/JPEGImages/%s.jpg\n'%(wd, year, image_id))

        convert_annotation(year, image_id)

    list_file.close()



os.system("cat 2007_train.txt 2007_val.txt  > train.txt")

os.system("cat 2007_train.txt 2007_val.txt 2007_test.txt > train.all.txt")

第三个：

file name:3_k_means.py

# -*- coding=utf-8 -*-

import glob

import os

import sys

import xml.etree.ElementTree as ET

import numpy as np

from kmeans import kmeans, avg_iou



# 根文件夹

ROOT_PATH = '/VOCdevkit/VOC2007这个文件路径'

# 聚类的数目

CLUSTERS = 9

# 模型中图像的输入尺寸，默认是一样的

SIZE = 640



# 加载YOLO格式的标注数据

def load_dataset(path):

    jpegimages = os.path.join(path, 'JPEGImages')

    if not os.path.exists(jpegimages):

        print('no JPEGImages folders, program abort')

        sys.exit(0)

    labels_txt = os.path.join(path, 'labels')

    if not os.path.exists(labels_txt):

        print('no labels folders, program abort')

        sys.exit(0)



    label_file = os.listdir(labels_txt)

    print('label count: {}'.format(len(label_file)))

    dataset = []



    for label in label_file:

        with open(os.path.join(labels_txt, label), 'r') as f:

            txt_content = f.readlines()



        for line in txt_content:

            line_split = line.split(' ')

            roi_with = float(line_split[len(line_split)-2])

            roi_height = float(line_split[len(line_split)-1])

            if roi_with == 0 or roi_height == 0:

                continue

            dataset.append([roi_with, roi_height])

            # print([roi_with, roi_height])



    return np.array(dataset)



data = load_dataset(ROOT_PATH)

out = kmeans(data, k=CLUSTERS)



print(out)

print("Accuracy: {:.2f}%".format(avg_iou(data, out) * 100))

# print("Boxes:\n {}-{}".format(out[:, 0] * SIZE, out[:, 1] * SIZE))



x = out[:, 0] * SIZE

y = out[:, 1] * SIZE

a = []

b = []

for xx in x:

    a.append(int(xx))

for yy in y:

    b.append(int(yy))

num = len(a)

strs = ""

for i in range(num):

    strs = strs+str(a[i])+","+str(b[i])+", "

print(strs)

ratios = np.around(out[:, 0] / out[:, 1], decimals=2).tolist()

print("Ratios:\n {}".format(sorted(ratios)))

第四个：

file name: kmeans.py

import numpy as np





def iou(box, clusters):

    """

    Calculates the Intersection over Union (IoU) between a box and k clusters.

    :param box: tuple or array, shifted to the origin (i. e. width and height)

    :param clusters: numpy array of shape (k, 2) where k is the number of clusters

    :return: numpy array of shape (k, 0) where k is the number of clusters

    """

    x = np.minimum(clusters[:, 0], box[0])

    y = np.minimum(clusters[:, 1], box[1])

    if np.count_nonzero(x == 0) > 0 or np.count_nonzero(y == 0) > 0:

        raise ValueError("Box has no area")



    intersection = x * y

    box_area = box[0] * box[1]

    cluster_area = clusters[:, 0] * clusters[:, 1]



    iou_ = intersection / (box_area + cluster_area - intersection)



    return iou_





def avg_iou(boxes, clusters):

    """

    Calculates the average Intersection over Union (IoU) between a numpy array of boxes and k clusters.

    :param boxes: numpy array of shape (r, 2), where r is the number of rows

    :param clusters: numpy array of shape (k, 2) where k is the number of clusters

    :return: average IoU as a single float

    """

    return np.mean([np.max(iou(boxes[i], clusters)) for i in range(boxes.shape[0])])





def translate_boxes(boxes):

    """

    Translates all the boxes to the origin.

    :param boxes: numpy array of shape (r, 4)

    :return: numpy array of shape (r, 2)

    """

    new_boxes = boxes.copy()

    for row in range(new_boxes.shape[0]):

        new_boxes[row][2] = np.abs(new_boxes[row][2] - new_boxes[row][0])

        new_boxes[row][3] = np.abs(new_boxes[row][3] - new_boxes[row][1])

    return np.delete(new_boxes, [0, 1], axis=1)





def kmeans(boxes, k, dist=np.median):

    """

    Calculates k-means clustering with the Intersection over Union (IoU) metric.

    :param boxes: numpy array of shape (r, 2), where r is the number of rows

    :param k: number of clusters

    :param dist: distance function

    :return: numpy array of shape (k, 2)

    """

    rows = boxes.shape[0]



    distances = np.empty((rows, k))

    last_clusters = np.zeros((rows,))



    np.random.seed()



    # the Forgy method will fail if the whole array contains the same rows

    clusters = boxes[np.random.choice(rows, k, replace=False)]



    while True:

        for row in range(rows):

            distances[row] = 1 - iou(boxes[row], clusters)



        nearest_clusters = np.argmin(distances, axis=1)



        if (last_clusters == nearest_clusters).all():

            break



        for cluster in range(k):

            clusters[cluster] = dist(boxes[nearest_clusters == cluster], axis=0)



        last_clusters = nearest_clusters



    return clusters

第五个：

file name: run.sh

python 1_take_label.py

python 2_voc_label.py

python 3_k_means.py

这是一个我自己写的工具链，很好用，我一般在完整的VOCDevkit目录里面使用，运行一下这个bash就能得到四个划分出来的子集，yolo格式的标注文件，还有划分文件2007_train.txt这种东西

需要注意的是Main文件夹里面的train.txt 里面只是单纯的文件名字，不包含路径，也不包含后缀

但是2007_train.txt 里面的这种就是文件的全路径，而且路径里面要注意对的齐图片的真实位置

五、配置

那么现在你的YOLO5/VOCData里面应该有了ImagesSets、JPEGImages、labels这个三个文件夹和2007_test.txt、2007_train.txt、2007_val.txt、train.txt、 train.all.txt这五个文件。

进入到项目根目录下面data这个文件下新建一个xxx.yaml配置文件

train: ./VOCData/train_all.txt

val: ./VOCData/2007_val.txt



# number of classes

nc: 666



# class names

names: ["xxxxx"]

修改为你自己的数据集配置。

然后到项目根目录models里面选一个你想用的模型，把nc哪一行的分类数改为和上面的一致，至于锚点聚类我没做，让它自动聚类好了

六、避坑

1、上面的train_all.txt 注意不能用train.all.txt 只能有一个点

2、打开utils/datasets.py第393行

sa, sb = os.sep + 'JPEGImages' + os.sep, os.sep + 'labels' + os.sep # /images/, /labels/ substrings

你的图像目录要叫JPEGImages这哥们名字，否则就把这个改成你自己定义的名字，不然它会报错说没有label文件的诡异错误

3、batchsize调小一点，显存溢出不是OOM错误，会报CUDNN无效的奇葩错误

4、如果用requirest.txt里面的环境装会装出来一个cpu的pytorch，我就先用这个装了一遍，然后卸了torch，接着用conda装gpu的pytorch，再装一遍找不到的报错的包，就能通过torch的显卡可用性验证了。

原文链接：https://blog.csdn.net/Andrwin/article/details/124175363